03 January, 2009

Let's calculate some CSI

In a comment on the post the Case for Materialism Oleg laid down this challenge to Kairosfocus:


Let me illustrate that with an example. Here is a sequence of 60 bits:
1100100100 0011111101 1010101000 1000100001 0110100011 0000100011. Can you tell me whether this information is complex and specified?




I think this is great and would like to one step further and invite anyone to submit a sample so that an ID expert can calculate the CSI. What better way to clarify exactly what CSI means and establish to what extent it is objective?

I will add another example to the one posed by Oleg. I recognise that its really difficult to calculate CSI for biological entities so let's take something extremely straightforward and much used in the literature. What is the CSI of a hand of 13 cards which is all the spades in single deal of Contract Bridge? I am assuming that we are using the definition of CSI in this paper which WAD recently confirmed was the definitive account.

The full formula is on page 21: –log2[M·N· ϕS(T)·P(TH)].

However I would be content to see the calculation of just the specification component on page: 18.

σ = –log2[ ϕS(T)·P(TH)].

Or even just ϕS(T) which is defined on page 17 as:

the number of patterns for which S’s semiotic description of them is at least as simple as S’s semiotic description of T.


Thanks in advance to anyone willing to give it a go.

122 Comments:

Anonymous Anonymous said...

Mark:

For the record, I observe here that in your previous thread, I used a far simpler but nonetheless effective metric to address Oleg's case.

His case was 60 bits long so even if functional and specific in that functionality, not CSI as below the relevant range that on the crude metric begins at 500 - 1,000 functional and specific bits.

For the further record, I have contrasted the case of biofunctional DNA, pointing out that even with parasitic lifeforms that start at about 100,000 4-stage elements, we are well within the CSI territory, with 200 k+ functionally specific bits.

In short, CSI is not only measurable but can be measured in simple as well as sophisticated ways. The material issue is settled at the outset already, then.

And, that is all I need to do here.

G'day.

GEM of TKI

9:18 am  
Anonymous Anonymous said...

So 499 bits are "not CSI" but 500 are?

"CSI is not only measurable but can be measured in simple as well as sophisticated ways."

KF, can you give some examples, if CSI is so simple to determine then this should not be an issue.

Or will you concede that you cannot give a few examples and show how the CSI was determined?

Does an onion have more CSI then a human due to it's unusual genome?

Could you give the CSI in the following examples?

An Onion?
A spermmatozoa?
A solid 3 dimensional cube?
An unfertilised egg?
A fertilised egg?

If you can't work out the CSI in such "simple" examples then I don't see that CSI is simple to calculate and would ask that you retract that specific claim an no longer use it.

11:55 am  
Anonymous Anonymous said...

"For the further record, I have contrasted the case of biofunctional DNA, pointing out that even with parasitic lifeforms that start at about 100,000 4-stage elements, we are well within the CSI territory, with 200 k+ functionally specific bits."

Thus, all biological entities are too complex to have arisen by chance. What a surprise. And what a clear and convincing example of vacuous reasoning.

12:57 pm  
Blogger oleg said...

KF, as I said on the other thread, this string is not limited to 60 bits. I can supply 500, 1000, or 10000 bits of the sequence if necessary for the analysis.

All we want to see is how CSI is actually determined in a very simple case. If IDers cannot determine CSI even in simple cases, all their talk about CSI in biology is meaningless.

2:32 pm  
Blogger Zachriel said...

Mark Frank: What is the ϕS of a hand of 13 cards which is all the spades in single deal of Contract Bridge?

I'm just going to throw out a few descriptions without regard to length.

7-spades.
Laydown.
Grand slam.
Was that a penny a point?
Just an average hand in high-cards.
Oops! Somebody forgot to shuffle.

2:51 pm  
Blogger Zachriel said...

kairosfocus: His case was 60 bits long so even if functional and specific in that functionality, not CSI as below the relevant range that on the crude metric begins at 500 - 1,000 functional and specific bits.

Dembski's definition of CSI does not require a determination of functionality. And according to that definition, we can determine the *specified complexity* for sequences shorter than 500 bits.

What is the *specified complexity* of olegt's example?

2:59 pm  
Blogger Zachriel said...

Kairosfocus has commented on Uncommon Descent.

kairosfocus: His case was 60 bits long so even if functional and specific in that functionality, not CSI as below the relevant range that on the crude metric begins at 500 - 1,000 functional and specific bits.

Olegt provided the first few digits for convenience, but offered to provide as many as required. It's more a matter of being able to follow the arithmetic, step-by-step. I have a few examples of my own I would like someone to try.

kairosfocus: Onlookers may wish to look at point 5 in 42 above for the simple but I believe useful enough metric I used, and the other points here give its context.

But you didn't calculate anything. Please try using your method with olegt's example.

3:48 pm  
Anonymous Anonymous said...

This inspired me to make a primer of how to defeat an ID theorist:

1) Come up with a random bit string. Since bit strings can encode ANYTHING, from DNA nucleotides, to poetry, to web addresses or Intel x86 register values, there is a seemingly limitless set of background patterns with which the IDer will have to compare against, if they can.

(The set of background patterns is relevant to the calculation of CSI or Functional Information, since we need to see how small the the islands of functionality are relative to the total set of all configurations. We need to define function in context, if looking for Functional Information, for example.)

2) Since the pattern is meaningless and random anyway, it doesn't matter that we didn't provide the set of background patterns, since they would never find a match in any set. So they will continue to search against all possible sets of background patterns.

They will lose sleep.

Not eat.

Get fired from work but they will continue to search.

3) They will never be able to conclude design, nor will they be able to conclusively rule it out. They could always research the question more, but will not be able to conclude design.

4) You can now claim victory, since no definitive answer was given. Alert all the blogs you know of and be sure to include this in a FAQ somewhere. It is probably the greatest victory the world has ever seen.

---

You can also include slight variations, such as giving an actual designed pattern but not giving the context of relevant background patterns or an estimate of their prevelance in the search space. The IDer will still have lots of trouble checking EVERY possible set of patterns, so the above will still probably hold.

Feel free to use my primer whenever you come across an IDer.

Atom

PS Don't bother replying as I don't spend time on this blog. If you want to raise the issue at UD (as Mark Frank did) I do read that blog.

5:58 pm  
Blogger Zachriel said...

Atom: PS Don't bother replying as I don't spend time on this blog. If you want to raise the issue at UD (as Mark Frank did) I do read that blog.

Hi Atom, you should be aware that Uncommon Descent is *not* an open forum, but frequently bans reasoned dissent.

6:37 pm  
Blogger Zachriel said...

Atom: 1) Come up with a random bit string.

Dembski only claims CSI is a one-way filter. In the case of an apparently random sequence, even though the sequence might have meaning to someone, without access to that knowledge, the answer would be indeterminate. We can live with a one-way filter.

That's not the problem with CSI.

6:44 pm  
Anonymous Anonymous said...

I admit that gpuccio has given a procedure for calculating functional CSI that does work if all variables are known.

For example, I tried it on a Garden of Eden pattern in the Game of Life, and came up with the unsurprising answer that a 297-bit pattern known to be unique and unevolvable, has 297 bits of CSI (for the function of being a Garden of Eden).

But if CSI depends simply on a measurement of function, then surely evolution is capable of increasing it?

For example, suppose you've chosen your function, decided that it's specific (let's say flagella with excellent swimming ability like the ones we have today). You can't rule out the natural process of evolution yet, but I'll carry on to the last step.

CSI is given by the num of possible flagella with this swimming ability, over the num of all possible sequences of that kind.

The num of possible flagella with that level of ability is the interesting one here. It's quite trivial that the higher that ability level gets, the fewer flagella there are capable of doing it, so CSI goes up.

But we know perfectly well that natural evolution increases fitness. It does that almost by definition.

For example, when I've been running evolution sims, I've found that my little organisms don't care much about functions. All they care about is beating the most fit organism at any given moment.

eg: a population with fitnesses of:

2, 2, 2, 1, 1, 1, 1, 1

quickly becomes

2, 2, 2, 2, 2, 2, 2, 1.

This is then likely to become:

3, 3, 2, 2, 2, 2, 2, 2.

Then

3, 3, 3, 3, 3, 3, 2, 2.

And so forth. The point is that if CSI is defined in terms of a mininum fitness, there is absolutely no reason to rule out natural evolution, which is known to increase this gradually. You can see it above.

It can of course be argued that evolution of CSI up to 500 bits (that is, a ratio of 1 fit sequence to every 2^500 sequences, a very sparse search space) may not be possible, but then this results in a discussion about 'how connected are search spaces really', etc, which normally goes nowhere. (although I'd point out that if you already /are/ a flagellum, the chances are your next mutation will still leave you a flagellum instead of one of the quadquadquadquadrillion random molecular clouds)

8:01 pm  
Blogger oleg said...

Atom,

If you don't plan on reading this blog, why did you even bother posting here?

I must say that your invitation to debate things at UD looks a bit insincere. The place is notorious for censoring dissent. Zachriel and I have been banned from that forum. Mark's place is a neutral venue, where the discussion seems quite collegial, so I see no reason for you to run away to the safety of the UD fortress.

8:21 pm  
Blogger oleg said...

Atom wrote: The set of background patterns is relevant to the calculation of CSI or Functional Information, since we need to see how small the the islands of functionality are relative to the total set of all configurations. We need to define function in context, if looking for Functional Information, for example.

Actually, no, we're not talking about functionality here. I'm asking whether my pattern is specific and complex.

8:32 pm  
Blogger gpuccio said...

Mark,

I have not much time to spend here too, so just a few clarifications for the moment.

1) I am interested more in the application of the concept of CSI to biology, than in the general mathematical formalization of CSI. I am not a mathematician, and I leave that to Dembski and others.

2) I have given previously here my idea about measuring CSI in proteins. I think it was clear and applicable to real cases. I stick to that, and have not much to add. For another empirical and very good approach to measuring functional information in proteins, based on Shannon's H, I suggest the paper:"

"Measuring the functional sequence complexity of proteins"

by Durston, Chiu, Abel and Trevors

Without being a final answer, that is a very reasonable empirical approach to measuring functional complexity in proteins.

2) In my personal approach to biological application, I, like kairosfocus and others, am more comfortable with Dembski's previous definitions of CSI, or with the even previous definition of FSCI. For the differences, at least in Dembski's thoght, please see Addendum 1 in Dembski's paper you linked at. In particular, where it says, about the UPB: "Thus, for practical
purposes, taking 10−150 as a universal probability bound still works. If you will, the number stays the same, but the rationale for it has changed slightly."
Thus, for practical purposes (and my purposes are very practical) I stick to the old UPB, and will not deal with the calculations of ϕ
S(T), which I am not even sure to understand correctly.

3) For my practical purposes, specification remains functional specification. And in particular biological functional specification. I find that objective enough for all biological models.

4) While I am not really interested, therefore, in calculating the CSI of your string (which, I suppose, is not a DNA or protein sequence), I would just the same comment that, even if I tried, I have no way to tell you if the string is specified, because to my untrained mathemathical eye it reveals no specification (indeed, I have not even seriously tried to find it). What does that mean? Simply that it is you who should tell me what possible specification you see in it (if any), otherwise I will just affrim that IMO it is random. If I am wrong, and a specification exists, this will be just a case of false negative, obviously due mainly to my mathematical ignorance. But false negatives are certainly fully admissable. The EF has never affirmed that there are not false negatives. It's the false positives which are logically possible, but empirically impossible.
Just an example: to assess this post as specified, one has to know english, or at least to be able to recognize it as a language. If this post were in a language which I completely ignore, I could have some indirect clue that it is specified, but I could never be sure. But it would be specified just the same.
In other words, specification needs to be "recognized" by what Dembski calls a "semiotic agent", but not all semiotic agents will recognize all specifications.
That's why I say: if you want me to calculate anything in your string, just tell me if, as far as you know, it is specified, and why.

The complexity of the string is easier. The complexity of the search space is obviously 1: 2^60, which is conmplex enough for me, but not enough to satisfy Dembski's conventional UPB (1:10^150), or even the more recent value of 1:10^120 (which should anyway be coupled to the calculation of ϕS(T), if I understand well). So, I can already tell you that your example would not qualify as CSI, even if it were specified. Obviously, that does not mean that it is not designed: it could be, and in that case it would be one of the possible false negatives of the filter (not enough complexity).
Anyway, if we wanted to calculate the CSI in the old way, we should, once known the specification, take into account the size of the target space (all the sequences of that length which satisfy the specification) and then take into account the probabilistic resource (which random system is supposed to have generated the random string, in what time, and so on). Then, and only then, we could calculate the real probability (complexity) of the specified target, in relation with that type of specification. I am not interested to calculate it for any possible specification, and I can't do it. So, if that is your question, I cannot help there.

More on biological applications in my next post.

8:40 pm  
Blogger gpuccio said...

Mark,

just some comments on biological CSI, which is in the form of FSCI. Here, the specification is given by a function. That's the only specification I am interested in, at least in biology (I am obviously greatly interested also in the general theory of specification, but only at a more general level).

Biological function is bery different form "any possible function". It is defined in very specific contexts, and it can be empirically verified and measured.

Let's consider, for example, a general model of a living cell. To make it specific enough, let's say it is a bacterial cell. And to make it generic enough, let's say that it is a generic bacterium, so that we can include in the model most known bacterial contexts. We could obviously choose different contexts (eukaryotic cells, multicellular beings, and so on), and we should adapt the reasoning to our chosen scenario.

So, for a ganeric living bacterial cell, let's consider one protein of its proteome, and let's choose for simplicity one protein which has no important homology with others (that is not an important limitation, but it makes the reasoning simpler). Obviously, we assume that we know the protein, its sequence, and its biochemical function in the bacterium.

With another natural simplification, we will assume as general search space the space of all possible protein sequences of that length (that is obviously a lower limit). And we will assume that the sistem can be well modeled by a uniform distribution (I have debated often the reasons for that, and I will not go back to them here).

Now, we have to define the function in the context. That can be done in different ways, and each definition has different consequences for the model (and the calculations). But each definition is perfectly admissible and objective, provided that one remains consistent in the following reasoning. I suggest here some possible definitions of fucntional targets, form the most generic to the most specific:

1) Any protein which can fold in one of the known functional ways.

2) Any prtoein which can fold as in 1), and which can have some biochemical function in the context (the bacterial cell).

3) Any protein as in 2), which can have any useful biochemical function in the context.

4) Any protein as in 3), which can have a fucntion which gives survival advantage, and which can therefore be empirically fixed in the population.

5) Any protein which can have the specific function of the observed protein, at a minimum detectable level which can be fixed as we want.

These are just examples. Please notice that the size of the target space is obviously decreasing from 1) to 4), because ech new definition is of a subset of the previous ones, while it is not evident at all that 4) is smaller than 5), becasue there is no a priori reason (and no empirical reason too) that one single functional protein with a specific function is sufficient, in itself, to give a fixable reproductive advantage.

So, for each of those definitions we can try to calculate a trget space size. Sometimes it can be easier, sometimes more difficult. Maube we can at present not be able to calculate thsoe sizes with enough precision, but we can almost always make reasonable approximations (a perfectly legitimate procesure in empirical sciences, including physics), and the important point is that all those values are in principle empirically measurable. For instance, in not a very long time we will probably be able to say with reasonable precision how many protein sequences of a certain length can fold in a functional way. And so on.

And there is always another empirical approach which can be used, and has been used: instead of calculating probabilities, we can just start from the epidemiological observation of definite models in nature, and observe how much real facts are in accord with the models. That's exactly what Behe has started to do in TEOE. And there is a lot of work to do in that direction.

So, my point is: biological systems can definitely be explored, in many different ways, and pretty quantitatively, in connection with the concept of fucntional information. That's what must be done. And it will be done ever more, in the next future. No theory about the causal origin of biological information, be it darwinism or ID, can survive without empirical analysis of known facts.

9:13 pm  
Blogger gpuccio said...

Just a correction:

in the above post, it should be:

"while it is not evident at all that 5) is smaller than 4), becasue there is no a priori reason (and no empirical reason too) that one single functional protein with a specific function is sufficient, in itself, to give a fixable reproductive advantage."

9:18 pm  
Blogger oleg said...

gpuccio wrote: While I am not really interested, therefore, in calculating the CSI of your string (which, I suppose, is not a DNA or protein sequence), I would just the same comment that, even if I tried, I have no way to tell you if the string is specified, because to my untrained mathemathical eye it reveals no specification (indeed, I have not even seriously tried to find it). What does that mean? Simply that it is you who should tell me what possible specification you see in it (if any), otherwise I will just affrim that IMO it is random. If I am wrong, and a specification exists, this will be just a case of false negative, obviously due mainly to my mathematical ignorance. But false negatives are certainly fully admissable. The EF has never affirmed that there are not false negatives.

gpuccio,

First of all, I'd like to thank you for being gracious in conceding the point, however small.

Let me also point out that the problem with Dembski's latest approach is not limited to false negatives as in this case. (My sequence was not random—see here. If someone wants a new sequence, I'll be happy to oblige.) It is also prone to false positives when one tries to rule out a natural origin. Let me explain.

My sequence was, in fact, functional: I once used its decimal equivalent as a password. Dembski discusses a very similar situation, taken from The Da Vinci Code, in Section 7 of his paper. Onlookers, observe that that sequence was only 10 decimal digits, or 33 bit-long! ;) The number φ_S(T) for a purely random sequence would have been 10^10. Knowing the owner of the password, the novel characters figure it out easily. For them, φ_S(T) was not an astronomical number.

Now you see that the inquirer's knowledge about the designer of the sequence is crucial to calculating φ_S(T) and hence CSI. If one assumes a completely random way of making a particular protein, that number would be astronomically large. Dembski's CSI is able to rule out proteins assembling from a large number of individual atoms. But nature doesn't make proteins that way. Thus we don't know φ_S(T) for them. Using the assumption of a random assembly gives a much larger number and that may lead to a false positive.

9:23 pm  
Blogger gpuccio said...

oleg,

I think I can in part agree with your last comment. What I believe is that Dembski is trying (pretty alone, and against terrible resistance) to accomplish a general formalization of the problem of CSI, meaning and function, and that is certainly not an easy task. I admire him very much for that. But I agree that he still has much work to do, even if his accomplushments up to now have anyway tremendous significance for the problem of biological information (see for instance his recent work with Marks in active information and GA).

On the other end, darwinists tend to ignore completely the problem of function and meaning as related to the complexity of information and the design process. The only reason, IMO, is that they have to defend an empirical model which cannot work, and ignorance is always the best way to defend something which does not work.

Going back to biological information, I perfectly agree with you that "If one assumes a completely random way of making a particular protein, that number would be astronomically large". Then you say "But nature doesn't make proteins that way." and relate that affirmation to the calculation of φ_S(T). Again, I will not follow you there because I have already admitted (and I can easily repeat) that I am not sure that I understand correctly the concept of φ_S(T).

But more simply, if with your affirmation you mean that nature does not make proteins only by random means, again I agree with you: the darwinian theory obviously includes a process, NS, which by definition is not really random, but is a form of necessity (although in a rather twisted way). We have never denied that. I can easily admit that "if" NS could select each single variational step at the molecular level, the darwinian theory would be, if not proved, at least more credible.

But the fact is that NS cannot do that. GAs can often do that, becuse thet have precise information about the result to be obtained. GAs can often measure the function serached for, when they don't alredy know the final result (see the famous, or infamous, weasel).

But, as I have often argued, nature knows nothing. NS itself is an indirect function not so much of the environment, as rather mainly of the replication mechanisms of the replicator. And the important point is that NS has to act on complex functions, important enough to give a reproductive advantage.

That's in no way the same than measuring a function, like protein engineers, or the immun system, can do. Measuring a function after a targeted random search is a typical engineering procedure when the engineer does not know in detail the result, but can measure any variation in the function he has to attain. Even better, if an engineer knows how to implement a function, he can directly write the pertinent information. That would be direct design. The former case is indirect design through some random serach. But both cases are design.

On the contrary, NS as it appears in darwinian theory is nothing similar. There is no real model based on necessity of how any single new protein could have been attained (save very minor examples, which you certainly know, and which are based on very trivial variations in a very specific island of functionality). And I am sure that any model which can be proposed (if and when it will be proposed) will be easily shown empirically impossible in its necessary random, and not selectable, parts.

So, I am not saying that nature is using only randomness. I accept that nature can use both randomness and necessity, in any mix you like. But the necessity part has to be specified with a credible and detailed model. And the random part has to be proven to be in the range of credible probabilistic resources.

As far as I know, neither of these two things has ever been done.

10:28 pm  
Anonymous Anonymous said...

gpuccio: I really fail to understand how you can't see how fitness functions and GAs work.

Are you aware that most of the useful GAs are not aiming for a target at all, but are merely trying to find the minima of multiple parameters?

For example, try giving a GA the fitness function of 'get a shape with a large area and small perimeter'.

You know this is a circle, but digital organisms don't; they'll simply breed until they find they're maximising their score. In the example above there'll be a time when the cost of the perimeter outweighs the benefit of the area enclosed (depending on how the two are scored).

This is the better analogy to natural selection: multiple, dependent parameters, like an organism in nature is facing. Whatever their speed, vision, hearing etc may be, the only answer nature gives back is how many kids they have.

A fitness function is simply a simulation of that.

Would you argue that these untargeted GAs are sneaking in active information? I suggested on UD that this was not true, or rather, that the only information they were sneaking in was the same amount that any organism gets from its environment anyway.

11:34 pm  
Blogger gpuccio said...

Venus,

that has been answered on UD. I don't know to which GAs you refer. In those analyzid both at UD and in Dembski and Marks' papers, the active information was always well evident.

You cannot obtain a computer program, or a functional protein, or a meaningful post, by a fitmess function. The functional meaning in that information depends on the correctness of the whole piece of information. There is no way to "build" that information by simple functional steps. There is no accumulation of selectable microevolution which brings to that kind of macroevolution. I really fail to understand how you can't see that.

GAs should simulate how true CSI can emerge through RV and NS. I have said many times that the only way to simulate that is to buid some kind of digital replicator which, in a digital system not programmed to make it evolve, really evolves. In other words, let us realize the principle of "any possible new function" in a real simulation environment. The environment can be any digital environment, or operating system. And the programmer must only create digital replicators which can make random errors. Amd let's wait that, form those random errors, any possible function emerges which can profit of the digital environment where it is replicating.

That would be a real simulation. When I see a complex program emerging from a simple replicator, in those conditions, and expressing completely new functionalities, which were completely absent in the original replicator, and without any intentional programming of the system to specifically help those functions (indeed, the emerging functions should be completely unexpected), and with a gain of new functional information which is beyond 500 bits, then you will have convinced me.

12:01 am  
Blogger oleg said...

gpuccio,

You first agree with me that CSI is not (yet) a well-defined concept and then require that someone "should simulate how true CSI can emerge through RV and NS." Aren't you a little bit inconsistent?

12:27 am  
Blogger Zachriel said...

gpuccio: I stick to the old UPB, and will not deal with the calculations of ϕS(T), which I am not even sure to understand correctly.

The UPB is meaningless without being able to calculate the other terms. Is there anyone here who can actually "calculate some CSI"?

2:03 am  
Anonymous Anonymous said...

Mark,

My focus now has to return to other matters.

I express appreciation for your graciousness in hosting what I have had to say.

On the matters raised below, I note that I have posted the below in the EV thread at UD. I trust that it serves well enough on the substantial issue that it will help clarify for those seeking to understand and to discover the truth; the true objective of reason.

GEM of TKI

____________

GP (and Mark):

I took a look at the thread over at MF's blog on calculating CSI, on seeing your post just above.

GP, you have with great patience composed a more than adequate response, and then undertook onward interactions with the set of commenters there; and in far more than a mere two posts!

I applaud your effort, and agree with its overall substance and even moreso with your ever so gracious tone.

Having already spent some time at CO, I will simply note here on a few points:

1 --> For those who wish to look, I have updated my online note to highlight metrics of FSCI, first the intuitively obvious functionally specified bit, and also have given an excerpt on Dr Dembski's more sophisticated 2005 model.

2 --> Venus, in the old English Common Law, murder was defined as maliciously causing the death of an innocent within "a year and a day." The line has to be drawn somewhere, and you will note that in the case of interest, (1) the EF is designed to be biased towards false negatives, (2) I in fact use a range from 500 - 1,000 bits to take in reasonable cases of islands of functionality in the wider config space, and (3) the biologically relevant cases are far, far beyond the upper end of the threshold band.

3 --> I also note that I responded to a stated bit string of unspecified functionality of 60 bits length. It is not a natural occurrence observed "in the wild", and alphanumerical characters are contingent. It could be the result of a program that forces each bit, or it could be the result of a [pseudo-]random process. I simply pointed out that as given there is not enough length to be relevant to the FSCI criterion.

4 --> Subsequently, it was stated that this is the product of a definitive process in mathematics, by reference to an example of a pseudorandom string based on an algorithm, by Dr Dembski. In short, per the report taken as credible, it is the result of an algorithm, which is of course in all directly known cases, designed. And indeed, once we bring in the workings of the algorithm to generate the bit string, we immediately will observe that the complexity jumps up hugely; credibly well beyond the threshold, once we take in the statement of the algorithm, the compiling of it, and the physical execution required to implement the stated output.

I trust that these notes will be helpful.

GEM of TKI

_________________

End of UD comment.

G'day all.

GEM of TKI

PS to Z: The config spaces for the cases of interest -- e.g. DNA -- are quite well enough defined to allow reasonable empirically based models. The islands of function within those spaces are sufficiently definable to make the point of the EF very relevant. "Good enough for Government work," as they used to say. [I recall here the story my uncle used to tell me. of the mathematician and the engineer placed the same distance from a young lady and given a distance approach algorithm that would not converge to the perfect value. The mathematician saw the problems and threw up his hands in despair. The engineer took the "good enough" approximation and had a lovely dinner date!]

6:18 am  
Blogger gpuccio said...

oleg:

"You first agree with me that CSI is not (yet) a well-defined concept and then require that someone "should simulate how true CSI can emerge through RV and NS." Aren't you a little bit inconsistent?"

No, I don't agree with you on that. In the simpler form, based on empirically detectable function, which I have applied to biological models, it is a very well defined concept. It certainly requires further theoretical work to be an universal concept, applicable to any possible model. But for the purposes which I have shown, it is perfectly good.

And my proposal of simulation is, IMO, perfectly appropriate for the model which should be proven (darwinian evolution).

So, I don't think I am being inconsistent at all.

8:12 am  
Blogger gpuccio said...

Zachriel:

"The UPB is meaningless without being able to calculate the other terms. Is there anyone here who can actually "calculate some CSI"?

I thought I had given a very specific procedure for doing that in biological models. Why don't you comment on that?

8:14 am  
Blogger Mark Frank said...

Gpuccio
First thanks for participating, without you it would be a very one-sided discussion.

My prime interest was in checking my understanding of Dembski’s paper and also how many ID proponents understand it. As you have admitted you don’t fully understand it, I guess the answer so far is zero

I will leave comments on your own method of calculating CSI until after I have walked the dog…

8:33 am  
Blogger Mark Frank said...

Gpuccio
I wanted to spend some more time on this but a domestic problem has come up. So I will just point out a couple of oddities in your biological definition of CSI. I will try to avoid the usual long-running unresolved debates.

• Your definition is relative to a functional specification. This means there are several CSIs for any given protein depending which functional specification you choose.

• If you choose the specification “plays the same role as this protein” then the CSI for one protein has no relationship to the CSI for another protein.

Many people compare CSI to entropy. But there is only one entropy for a system and the entropy of one system is the same kind of entropy as the entropy of another system. So your CSI seems to be a different kind of beast.

11:32 am  
Blogger gpuccio said...

Mark,

you are perfectly correct in your comment. The fact is, if you want to compare CSI to total entropy, you should compute the total CSI of the system, which would be the sum of all partial CSIs, plus the additional CSI implied in all their functional interactions. I think that is in principle possible, but empirically and theoretically difficult. And why do that, if partial "entropy" is more than enough for our discourse?

But I can't see your problem. If I consider a single protein as a sub-system, I can still compute its CSI. It is true that the function has to be defined in relation to a bigger system, but the computed CSI remains a property of that sub-system. I really don't see any problem in that.

Again, I see the concept of CSI as a very practical one. In biology, it is a measurement of how difficult it is to get the information which ensures the defined function by means of a random search. It's very simple, indeed.

Finally, I want to make an example of calculation of CSI in a slightly different way, from the paper "Measuring the functional sequence complexity of proteins", which I have often cited as an alternative approach.

Let's take, form that paper, the example of the protein sequence "P53 DNA domain". Its length is given as 157 aminoacids. Therefore, Shannon's H for a random sequence of the same length is given at 679 bits. For that protein, 156 sequences have been analyzed in different species, determining the uncertainty for each aa position. So, the Shannon H for the functional protein has been calculated, and the difference between the value for the random state and the value for the functional state calculated according to the following formula:

ζ = ΔH (Xg(ti), Xf(tj))

Expressed in Fits (functional bits), the result for that protein is 525 Fits. For that protein, we are therefore in the legitimate range of CSI (>500 bits).

Please note that in the paper the measurement has been done for 35 different proteins, and the results vary from a minimum of 46 Fits for Ankyrin, to a maximum of 2416 Fits for Flu PB2. Another interesting value is the functional density (Fits/aa), which goes from a minimum of 1.3 Fits/aa (Bac luciferase) to a maximum of 4 Fits/aa (Flu PB2). in other words, the method is measuring not only the functional complexity of the protein, but also the functional complexity in relation to length. That value, for instance, would be 3.3 for our example of "P53 DNA domain", showing that it is a very "functionally dense" sequence (high functional information in a relatively short sequence).

I am not saying this is the best method, or that it has no flaw. I am just saying that it shows a way to measure something, and that what is being measured is very real, and can be a good starting point for any serious quantitative evaluation of functional information in proteins, and for a serious analysis of models.

1:07 pm  
Blogger Zachriel said...

gpuccio: So, for a ganeric living bacterial cell, let's consider one protein of its proteome, and let's choose for simplicity one protein which has no important homology with others (that is not an important limitation, but it makes the reasoning simpler).

What you mean is proteins with no *known* homologies. We can see from this comment that the result depends on our background knowledge.

gpuccio: With another natural simplification, we will assume as general search space the space of all possible protein sequences of that length (that is obviously a lower limit).

We have to calculate ϕS(T), or the number of equivalent sequences of comparable complexity. You said you didn't know how to calculate this term. As such, you really can't say you can reach any conclusions about the validity of Dembski's CSI.

gpuccio: And we will assume that the sistem can be well modeled by a uniform distribution (I have debated often the reasons for that, and I will not go back to them here).

This is P(T|H). Evolution doesn't search the entire space so a uniform probability function is inappropriate.

It really comes down to this. If no one can follow through a few simple examples, but handwaves their way to the answer, then it is reasonable to assume that CSI is not a well-defined concept. That does seem to explain why Dembski's CSI is never utilized in scientific research.

1:20 pm  
Anonymous Anonymous said...

Mark

Kindly cf here.

GEM of TKI

1:26 pm  
Blogger Zachriel said...

gpuccio: I see the concept of CSI as a very practical one.

There's nothing in any of those formulas that disallows calculating the specified complexity for shorter sequences.

gpuccio: In biology, it is a measurement of how difficult it is to get the information which ensures the defined function by means of a random search.

Except that evolutionary algorithms don't proceed by random search. According to that thinking, it would take a hundred thousand trillion steps to evolve a twelve-letter word.

1:30 pm  
Blogger oleg said...

gpuccio: But I can't see your problem. If I consider a single protein as a sub-system, I can still compute its CSI. It is true that the function has to be defined in relation to a bigger system, but the computed CSI remains a property of that sub-system. I really don't see any problem in that.

I second Zachriel's reply. You can't calculate CSI of the natural protein formation.

Recall Dembski's example of computing CSI for the password in The Da Vinci Code. If you didn't know how the password designer operated and assumed that a random sequence, you got 10^10 possible passwords. However, someone familiar with the designer knew what kinds of sequences he would use and they had to contend with a much smaller space.

The same with the natural path to a "functional" protein. If you have no idea how proteins arose and use a model where its atoms get together "by sheer dumb luck", there is an enormous number of ways in which these atoms can be assembled. But, to reiterate what I have already said, that isn't how proteins arose. You don't have a working model of that, so you can't evaluate their CSI. You shouldn't make this claim.

1:48 pm  
Blogger gpuccio said...

Zachriel and oleg:

Your objections are really strange. Let's see if we can understand each other better.

I start with Zachriel:

"What you mean is proteins with no *known* homologies. We can see from this comment that the result depends on our background knowledge."

And so? All our scientific results depend on our background knowledge. I really can't understand which opinion you have of empirical science. And what are you suggesting? That all existing proteins have homologies to a common ancestor protein? Could you please, for once, express a reasonable model, instead if blindly and obstinately criticizing?

"We have to calculate ϕS(T), or the number of equivalent sequences of comparable complexity. You said you didn't know how to calculate this term. As such, you really can't say you can reach any conclusions about the validity of Dembski's CSI."

No, I have stated clearly enough, I think, that I am not using Dembski's definition of CSI in the 2005 paper. Mark has understood that simple concept, why can't you do the same? So, I don't calculate any ϕS(T), and I feel very happy about that!
That does not mean that I am not calculating CSI, according to my definition which I have given, or if you want according to Dembski's previous definition, or if you want according to the definition used by kairosfocus and a lot of other people.

"Evolution doesn't search the entire space so a uniform probability function is inappropriate. "

Again with that! How do you think that so many different proteins, with completely different primary sequences and lenths, have been generated? Without traversing the entire space? I know how: by design. But you? What is your model?

Or are you still stating that evolution works with what is already there? How is it, then that we have not thousands of similar proteins, all relegated in the same island of functionality, with only slight differences between them?

I paste here part of one post of mine at UD:

"I assume a quasi uniform distribution for the search space for two important reasons:

1) It is perfectly reasonable from what we know of random genetic mutations.

2) The proteins we do know (and we know a lot of them) are really interspersed in the search space, in myriads of different and distant “islands” of functionality.

You don’t have to take my word for that. It’s not an abstract and mathematical argument. We know protein sequences. Just look at them.

Go, for example, to the SCOP site, and just look at the hyerarchical classification o protein structures: classes (7), folds (1086), superfamilies (1777), families (3464). Then, spend a little time, as I have done, taking a couple of random different proteins from two different classes, or even from the same superfamily, and go to the BLAST site and try to blast them one against the other, and see how much “similarity” you find: you will probably find none. And if you BLAST a single protein against all those known, you will probably find similarities only with proteins of the same kind, if not with the same protein in different species. Sometimes, partial similarities are due to common domains for common functions, but even that leaves anyway enormous differences in term of aminoacid sequence."

3:06 pm  
Blogger gpuccio said...

Still with Zachriel:

"If no one can follow through a few simple examples, but handwaves their way to the answer, then it is reasonable to assume that CSI is not a well-defined concept."

I have given specific examples. Unless you are still speaking of calculating ϕS(T)...

"There's nothing in any of those formulas that disallows calculating the specified complexity for shorter sequences."

Are you referring to which example? The 60 bit pi example? Well, I think it's easy enough to calculate
CSI "in the old way" in that case. The search space is 2^60. The target space is one (if the specification is to correspond to the first figures of pi, only one sequence corresponds and expresses the function). So:

a) The sequence is specified

b) It has a complexity of 1:2^60 (that is 1:10^18). That would not qualify as CSI according to the traditional UPB. But if I found that result in a biological model, I would still believe, personally, that it is designed.

To be clear, the sequence "is" designed, so Dembski's threshold would give here, as expected, a false negative.

And to be clear, I am not saying that pi is designed. I am saying that a digital sequence of bits which corresponds to the numeric value of pi is designed (unless there is a random system or a set of necessary laws which can generate it in absence of design, which I am not aware of).

"Except that evolutionary algorithms don't proceed by random search. According to that thinking, it would take a hundred thousand trillion steps to evolve a twelve-letter word."

So, how do evolutionary algorithms proceed? I have already admitted that, if you can select all the intermediate results, the game becomes easy. But you cannot do that. Show me how you can generate proteins selecting all the intermediate steps, in a model where all the intermediate steps are in the range of reasonable probabilistic resources.
Or just explain why the simulation I have proposed is not a valid model of darwinian evolution.

3:23 pm  
Blogger gpuccio said...

oleg:

"I second Zachriel's reply."

It's your privilege.

"You can't calculate CSI of the natural protein formation. "

Why not? I had just shown an example, and with a method which is not mine, in my post of 1.07 pm. Your post is of 1.48 pm. Why don't you comment on that? Hadn't you enough time to read what I wrote?

"Recall Dembski's example of computing CSI for the password in The Da Vinci Code. If you didn't know how the password designer operated and assumed that a random sequence, you got 10^10 possible passwords. However, someone familiar with the designer knew what kinds of sequences he would use and they had to contend with a much smaller space. "

I don't understand your point: that's exactly what design does: a designer, as you say, "has to contend with a much smaller space", because he recognizes meaning. Personally, anyway, I would never have used the Da Vinci example: it's ambiguous and unrealistic, and anyway I don't like the book...

"The same with the natural path to a "functional" protein. If you have no idea how proteins arose and use a model where its atoms get together "by sheer dumb luck", there is an enormous number of ways in which these atoms can be assembled."

Again I don't understand your point. I am commenting about the darwinian model, which seems to have a very arrogant idea of how proteins arose. And, in the darwinian model, the engine of variation is random variation, that is atoms getting together "by sheer dumb luck". Are we speaking of the same thing? Or are you supporting some unknown model of which I am not aware?

So, what do you mean by "natural path to a "functional" protein"? Do you mean the supposed darwinian pathway where each step can be selected by reproductive advantage?

Well, if that is the case, please show it! It's your model we are commenting, after all, not mine: RV + NS.

So, let's sum it up again, for clarity: we have completely different proteins, a lot of them (see my post about SCOP). With completely different foldings, and functions. And, above all, with completely different primary sequences. They are distant islands of functionality in the search space (remember that it is the primary sequence which determines the distance). All these things are known facts, not assumptions.

The surprising assumption, which you make, is that it is possible to pass from one island to another without traversing the whole space. Or that intermediate island exist everywhere, near enough so that one can go form one to the other without facing too many statistical difficulties, and that in all known cases.

Well, that's an assumption if I ever saw one! And you are surprised that I am asking you to detail it?

"But, to reiterate what I have already said, that isn't how proteins arose."

So, how did they arise? Please, detail and elaborate.

"You don't have a working model of that, so you can't evaluate their CSI."

Yes, I have the working model of darwinian evolution: random variation plus NS. It's according to that model that I am evaluating. I am evaluating the CSI to verify that the RV part, in the model, be sufficient to generate it, together with all the NS which you will be able to detail in the model itself. In other words, I am reasoning scientifically, and seriously evaluating a model accepted by most, and which IMO does not work.

"You shouldn't make this claim."

I am afraid that I have done it, I am still doing it, and I am sticking to it.

4:27 pm  
Blogger Mark Frank said...

Gpuccio

Thanks for working through your definition of CSI. The worked examples make it clear what you mean, and also make clear what the problems are! (I think you left out the need to estimate how many attempts nature has made to achieve the target – but I can see how you can also do that.)

You have a method for coming up with a number. The next question is – so what?

To me the key problems with your method are:

• The choice of functional specification is a matter of opinion.

• The uniform probability distribution (UPD) assumption is wrong. I know you have debated it many times but I still cannot see how you can justify it. We know that new DNA sequences are not generated by throwing all the base pairs into a pot and stirring them. They are generated by various types of changes to proteins that are already functional in some respect.

But I want to go one step further.

Let’s

* set aside the subjective nature of the functional specification

* assume that your calculation demonstrates that a certain type of protein is incredibly unlikely to arise with a UPD.

* accept that this counts as evidence against a UPD model (not an obvious conclusion).

You then say that there is no model of how the protein can be produced with RM+NS:
“the necessity part has to be specified with a credible and detailed model. And the random part has to be proven to be in the range of credible probabilistic resources.”
and conclude design.

In other words, for you, it comes down to two alternatives: a non-UPD natural model or design.

Well maybe we don’t have the credible and detailed natural model (I am not enough of a biologist to be sure) but we do have some evidence for a non-UPD model:

• We understand and can observe many of the processes for generating new DNA. We can see how it would lead to a much better chance of new functional protein than a UDP model.

• We have, as you put it “very minor examples, which you certainly know, and which are based on very trivial variations in a very specific island of functionality”. They may be minor but they show the process can work.

• We have many, many proposals for much larger change which are not worked out in so much detail – as you would call them “just so stories”. They may lack detail and they may be proposals rather than proven accounts – but they are possible and open to investigation.

I am sure I am greatly underselling evolutionary biology but I am not a biologist and this will do for a start.

Now where is the evidence for a model including design?

5:46 pm  
Anonymous Anonymous said...

gpuccio, your pi specification was chosen to match oleg's sequence. The problem with this has been pointed out by Wein, English, and probably others. In fact, we can conclude that anything is specified by simply choosing the sequence or event itself as the specification.

5:49 pm  
Blogger oleg said...

gpuccio,

I am on a deadline and can't reply at length, but let me ask you this.

You said: "I am evaluating the CSI to verify that the RV part, in the model, be sufficient to generate it, together with all the NS which you will be able to detail in the model itself. In other words, I am reasoning scientifically, and seriously evaluating a model accepted by most, and which IMO does not work."

As Zachriel and I mentioned several times, the probability of a sequence depends on the distribution one assumes for it. If you assume a completely uniform distribution of all possible positions of atoms, the number of possible configurations is astronomically large. If you take into account constraints imposed by
physics (atoms do not overlap), the number goes down. Chemistry renders most configurations unstable (e.g., through hydrogen bonding). Biology imposes its own constraints: "defective" proteins give rise to less viable replicators that are weeded out through natural selection, so these proteins are not reproduced.

It seems to me that the Trevor and Abel calculation takes into account the crudest physical constraints but does not take into account the limitations imposed by chemical and biological factors. That would seem to overestimate the amount of information and potentially lead to a false positive.

Any thoughts?

8:26 pm  
Anonymous Anonymous said...

A few more responses to gpuccio:

- If 60 bits is too few to be considered CSI, then how is it that phone numbers and credit card numbers are examples of CSI?

- If you mean that the oleg's sequence is designed according to Dembski's definition of the word, then you're begging the question of whether oleg's thought processes fall outside of chance+necessity.

- Dembski's "complexity" is a function of a null-hypothesized event. Complexity can be ascribed to a physical instance of a sequence, but not the sequence per se. A sequence may be very improbable in one physical context but not in another. Why has nobody asked oleg what physical phenomenon the sequence describes? If it's a description of a planet's circumference relative to its diameter, then I can think of a natural hypothesis that renders it highly probable. If it describes some writing on a piece of paper, then it's probably human-made.

8:26 pm  
Anonymous Anonymous said...

Mark

Decided to check back.

Have you (and your commenters) noticed what I linked to at 1:26 pm on your timestamp [much less the very first post, which presents a link to an intuitively much easier to grasp solution ot the original case posed by Oleg . . .]?

Last post before this is up at 8:26 pm, seven hours later, and it does not seem to have been noticed that there is a suggested calculation relative to the set challenge on this thread, i.e. the X-metric for your hand of spades.

I again link the UD post where it is presented.

G'day

GEM of TKI

8:44 pm  
Blogger oleg said...

gpuccio said: So, how do evolutionary algorithms proceed? I have already admitted that, if you can select all the intermediate results, the game becomes easy. But you cannot do that. Show me how you can generate proteins selecting all the intermediate steps, in a model where all the intermediate steps are in the range of reasonable probabilistic resources.

Selection in the presence of random variations has been discussed at length at Panda's Thumb in response to Salvador Cordova's post at UD Gambler’s ruin is Darwin’s ruin. Joe Felsenstein, a professor of genome sciences and biology (and also an adjunct professor computer sciences and statistics) at the University of Washington, wrote a good exposition of the subject Gambler's Ruin is Darwin's Gain. I strongly encourage you to read the posts and the followup discussion, which is long but well worth your time. The main idea is quite simple: RV + NS does not equal random walk. It is only slightly less effective than perfect selection.

8:47 pm  
Blogger gpuccio said...

Mark:

why is it that I always find your comments particularly stimulating?

Well, first of all I want to specify that the purpose of ID, and mine in particular, is not to falsify any possible non design theory of evolution. I am satisfied with falsifying the explicit ones, and even that does not mean that they are demonstrated logically impossible, but rather empirically unsupported (well, they are also somewhat logically inconsistent, but let's ignore that for the moment). I have always specified that empirical science is a search for best explanation, not for absolute truth. And if and when new theories of unguided evolution, with new causal mechanisms, will be made explicit and detailed, we will seriously take them in consideration (to falsify them too, obviously, which will be great fun!). But we cannot falsify any possible theory which can ever be formulated. That's really too much even for an IDist.

And it's nor necessary. If you agree with me that no specific prejudice should affect science, the design hypothesis and the non design hypotheses can well co-exist and have an useful interaction, and anybody is free to judge which is at present the best explanation. Obviously, such an idyllic scenario, which at present seems to be realized only between you and me, is far from what really happens. But, you will pardon my partiality, I really think that the main responsibility is not of ID.

That said, I don't think that you are underselling evolutionary biology: you are still overselling it, but not so much as it is usually oversold, and so we can reason...

You ask where is the evidence for a model including design. Well, the answer is very simple: the evidence is design itself. we should not forget that even Dawkins admits that the "appearance" of design is everywhere in biology. But who says that it is an "appearance"? Evolutionary biology, obviously. And if evolutionary biology is wrong (and it is wrong!)?

Well, in that case we are back where we started: biological entities appear designed, and they may well be designed, if no other explanation applies.

Moreover, I invite you to reflect on the argument recently made by Gil Dodgen about the canals of Mars, which IMO is a very strong argument. The "appearance" of design not only is not going away as we understand better the nature of biological reality, but is rather increasing exponentially. For me, who try to be as empirical as possible, that means a lot.

So, let's see: Darwin and his contemporaries believed in blobs of protoplasm with a gross membrane, and they were wrong. I studied medicine in the '70s, and they taught me that the cell was rather more complex, and that the membrane was a double layer of lipids, and that genes made proteins and that was the secret of life, and that operons and simple feedbacks controlled everything. I believed that, and I was wrong.

Now, about 30 years later, one of my deepest satisfactions is to just read the new data which daily come out of experimental biology and molecular medicine (just stopping before the ritualistic conclusions which, denying what has been said in the article, devotionally praise the unexpected omnipotence of evolution). I enjoy the knowledge, and don't believe the conclusions: and I am right and happy.

Those who, like you, have not a specific knowledge of biology and medicine are IMO partially excused. After all, you may not have a realistic idea of what we currently know (and, especially, of what we currently don't know, but are beginning to see in the far distance). You may have no idea of how much things have changed. You may have no idea of how much design and teleology and intelligence, which could be just one hypothesis at the times of Darwin, are today beyond certainty.

And, finally, let's remember that human design can and does easily generate new CSI.

But let's go yo your "underselling". You say:

"We understand and can observe many of the processes for generating new DNA. We can see how it would lead to a much better chance of new functional protein than a UDP model."

I don't understand what you mean. I suppose that here you are only showing that you don't understand very much biology (no problem in that). Which are "the processes for generating new DNA"? I don't know any except:

1) the regular process (copying an existing DNA by a very specialized, and complex, system, full of sophisticated and complex machineries and processes.

2) artificial human processes to amplify those natural processes or modify them (genetic engineering, PCR, etc)

3) random errors in those regular processes, especially during replication, be them single nucleotide substitutions, duplications, deletions, insertions, and so on. All of them random, all of them similar in meaning.

That's all. Where are the "many processes"? Don't tell me that you believe in Allen MacNeill's list!

You say:

"We have, as you put it “very minor examples, which you certainly know, and which are based on very trivial variations in a very specific island of functionality”. They may be minor but they show the process can work."

No, they are "really" trivial, and they only show that elementary processes of variation (let's call them "minimal microevolution" can happen. Those examples are almost always extremely artificial, and they imply usually a single aminoacid substitution. They mean nothing.

One of the best natural examples is the anti freeze protein. That is somewhat interesting, although its meaning is extremely limited. Behe has already dealed with that in his last book. But the interesting thing is that the original paper concludes:

"The notothenioid trypsinogen to AFGP conversion is the first clear example of how an old protein gene
spawned a new gene for an entirely new protein with a new
function. It also represents a rare instance in which protein
evolution, organismal adaptation, and environmental conditions
can be linked directly."

And the paper is of 1997, not of 1897.

"We have many, many proposals for much larger change which are not worked out in so much detail – as you would call them “just so stories”. They may lack detail and they may be proposals rather than proven accounts – but they are possible and open to investigation."

I absolutely agree! So, let them investigate. I look forward to knowing the results of those investigations. Because they will probably help the design scenario.

And as I said, I am ready to seriously analyze any new proposal which is enough "not just so story" to be susceptible to a rational analysis. And, if possible, to falsify it. In a sense, I am in part (but only in part) a Popperian.

9:30 pm  
Blogger gpuccio said...

oleg:

I suppose you too are not very familiar with biology.

You say:

"If you assume a completely uniform distribution of all possible positions of atoms, the number of possible configurations is astronomically large. If you take into account constraints imposed by
physics (atoms do not overlap), the number goes down. Chemistry renders most configurations unstable (e.g., through hydrogen bonding). Biology imposes its own constraints: "defective" proteins give rise to less viable replicators that are weeded out through natural selection, so these proteins are not reproduced."

It's the second time that you make that argument, so I will answer it this time. Apparently, you make a great confusion. I am not assuming a quasi uniform distribution of all the possible configurations of the atoms in a protein. That has nothing to do with my reasoning, and I think you are in some way referring to what is known as the "Levinthal paradox", which is an argument about protein folding which nobody has really taken seriously.

No, I am obviously speaking of the configuration space of all the possible nucleotide sequences in protein coding genes. It's there that the information is, and changes, not in proteins. And I am not aware of special biochemical constraints of those sequences (some may apply, but nothing fundamental). In other words, the new information has to be built at the DNA level through random variation, and I am not aware of any really relevant constraint of the possible nucleotide sequences at that level. That's why I assume a quasi uniform distribution of those sequences, which is reflected is a quasi uniform distribution of the respective proteins. It is trivial that non functional proteins will not fold or work, and if the problem is serious enough they will be discarded from the gen pool. That os negative selection which is the only well documented mechanism in evolutionary biology. But what has that to do with the distribution on DNA sequences?

You say:

"It seems to me that the Trevor and Abel calculation takes into account the crudest physical constraints but does not take into account the limitations imposed by chemical and biological factors. That would seem to overestimate the amount of information and potentially lead to a false positive."

No, the Trevor and Abel calculation takes into account essentially the variability of single aminoacids in the same protein in various species, which is exactly what, in evolutionary theory, is supposed to be the result of negative, positive, or neutral selection. in other words, of functionality. in brief, if an aminoacid never varies in all known instances of a protein, its H in the functional state is minimal, and it contributes much to the final Fits (which are the difference between H in the random state and H in the functional state). That obviously means that the aminoacid has to be that one, for the function to work. That is in accord with the assumption of NS, that is a sequenced is conserved it means that it has undergone positive selection, "because it is functional".

9:50 pm  
Blogger gpuccio said...

R0b:

You say:

"gpuccio, your pi specification was chosen to match oleg's sequence. The problem with this has been pointed out by Wein, English, and probably others. In fact, we can conclude that anything is specified by simply choosing the sequence or event itself as the specification."

What do you mean? It was Oleg who told me that the sequence was the binary from of pi. I have not even verified that it is so. I have faith in people (to a certain point). I don't understand your argument, could you please elaborate?

9:52 pm  
Blogger gpuccio said...

R0b:

You say:

"If 60 bits is too few to be considered CSI, then how is it that phone numbers and credit card numbers are examples of CSI?"

Who says that? Let's clarify a few points once and forever:

a) if we define CSI by a threshold, it's because we want to be as sure as possible that we will not have false positives. That will give us a lit of false negatives. Is that clear?

b) therefore, a lot of things are designed, but do not exhibit CSI. Something can be designed, but have a very low complexity. where is the problem?

c) where to put the threshold is a conventional problem. Dembski puts it at 1:10^150 (in his first formulation), because he wants to be "really" sure. Personally, I would put it at no more than 1:50. The reason is simple. The main purpose of that threshold is to make us sure that we are exhausting all possible probabilistic resources in the known universe. That brings us (in excess) to the UPB. But for our planet, and for biological probabilistic resources, it is obvious that a threshold at 1:10^50 is more than enough. But again, it is a conventional limit.

You say:

"If you mean that the oleg's sequence is designed according to Dembski's definition of the word, then you're begging the question of whether oleg's thought processes fall outside of chance+necessity."

It is well known (ask Mark!) that for me, all conscious processes "fall outside of chance+necessity". I believe they are possible only because of the existence of a transcendental "I".

"Dembski's "complexity" is a function of a null-hypothesized event. Complexity can be ascribed to a physical instance of a sequence, but not the sequence per se. A sequence may be very improbable in one physical context but not in another. Why has nobody asked oleg what physical phenomenon the sequence describes? If it's a description of a planet's circumference relative to its diameter, then I can think of a natural hypothesis that renders it highly probable. If it describes some writing on a piece of paper, then it's probably human-made."

I think I have answered that in my post of (when was it?) 3.23 pm. I paste here the relevant phrase:

"And to be clear, I am not saying that pi is designed. I am saying that a digital sequence of bits which corresponds to the numeric value of pi is designed (unless there is a random system or a set of necessary laws which can generate it in absence of design, which I am not aware of)."

And anyway, while the ratio of a planet's circumference relative to its diameter is not necessarily designed, a numeric description of it certainly is.

And the null hypothesis is always that the data have come out in some random system. Necessity has to be ruled out separately in the EF, as you certainly know.

10:14 pm  
Blogger gpuccio said...

oleg:

I will certainly read your links, as soon as I have time. For the moment, I am a little tired: it's been rather a tour de force here!

10:17 pm  
Blogger Zachriel said...

Zachriel: What you mean is proteins with no *known* homologies. We can see from this comment that the result depends on our background knowledge.

gpuccio: And so? All our scientific results depend on our background knowledge. I really can't understand which opinion you have of empirical science.

Because if you are unaware of homologous ancestors, you will then assume a uniform distribution yielding a claim of design. This result changes when you are made aware of plausible ancestors.

gpuccio: No, I have stated clearly enough, I think, that I am not using Dembski's definition of CSI in the 2005 paper.

Sigh. That's alright, but no one seems to be interested in defending Dembski's paper.

10:21 pm  
Blogger Mark Frank said...

gpuccio

", the answer is very simple: the evidence is design itself. "

That's circular. I have never seen an ID proponent offer any evidence for design of life except the implausibility of natural selection.

Real evidence of design would be ideally observation of the design process. Failing that at least a hypothesis about what the designer was trying to achieve.

Processes for generating DNA.

That's my phrase for all things you mention plus recombination. My point is that they are far more likely to generate proteins that are functional then mixing all the base pairs at random because they start with proteins that are already functional. If you take a working hemoglobin molecule and make a single mutation you will most likely be worse off, but you have a pretty good chance of ending up with a working hemoglobin molecule - and you might even improve your resistance to malaria. If you just randomly allocate amino acids to 560 odd positions you will almost certainly get rubbish.

10:27 pm  
Anonymous Anonymous said...

gpuccio, with respect and collegiality, you are almost as prolix as your colleague, kf.

It would be a kindness if you could be more focused and concise.

Compare the posts of your main adversaries here. They are excellent models of clear writing.

10:29 pm  
Blogger Mark Frank said...

Adelard

Don't be too hard on gpuccio.

English is not his native language. He could be more concise (couldn't we all) and I have once or twice complained about the length of his comments. But he writes much better than some native English ID proponents. It is hard work to respond to many adverseries and I appreciate his efforts.

gpuccio - it is also true that we would all benefit if you could edit your comments.

10:42 pm  
Blogger gpuccio said...

Mark:

it's not circular. In reality, as you should know, the whole argument is:

a) humans design things.

b) CSI is connected to human design, and nothing else (except biological information).

c) biological information exhibits CSI, like human artifacts.

d) therefore, a causal process similar to human design is a reasonable hypothesis for biological information.

You may accept that or not, but it's not circular.

You say:

"My point is that they are far more likely to generate proteins that are functional then mixing all the base pairs at random because they start with proteins that are already functional."

No, that's true only if you want to remain in the same "island" of functionality, and effect only small changes, just to try to refine one function which already exists. But what about reaching a new island, a new function, and a completely different primary sequence? As I have said many times, we have lots and lots of different sequences and different functions. That's a fact, not an assumption.

"but you have a pretty good chance of ending up with a working hemoglobin molecule - and you might even improve your resistance to malaria."

Please bear in mind that resistance to malaria comes from a very inefficient mutation of hemoglobin (hemoglobin S)which can survive, and badly, only in the heterozygous state.

"If you just randomly allocate amino acids to 560 odd positions you will almost certainly get rubbish."

That's certainly true (and I would almost take out the "almost"). But the same is true if you try to transform a duplicated gene of 560 aa into a completely different one.

In other words, a non uniform distribution is appropriate only for small changes all around an existing and limited function, and is drawn mainly by negative selection. But if you have to traverse the space all around (and you do have), than a quasi uniform distribution applies.

10:48 pm  
Blogger gpuccio said...

adelard and mark:

excuse me, but you ask and I answer. I will not edit my comments to the point of not making my points. That would bee too easy for you! :-)

Maybe if all of you become more receptive, we could find a reasonable compromise...

10:50 pm  
Blogger Zachriel said...

gpuccio: That does not mean that I am not calculating CSI, according to my definition which I have given,

Let me make sure I understand your method. You take the specificity (i.e. uniqueness of the description) divided by the number of possible sequences (uniform distribution). Is this correct?

Is this written down somewhere? I just want to make sure we understand exactly how the calculation is done.

10:51 pm  
Blogger gpuccio said...

Zachriel:

"Because if you are unaware of homologous ancestors, you will then assume a uniform distribution yielding a claim of design. This result changes when you are made aware of plausible ancestors."

Well, I am waiting to become aware of those homologous ancestors, when there will be some evidence of them. Or are we speaking of just so ancestors?

Moreover, I don't understand: if you have two completely different proteins now, it seems that you need two completely different ancestors in the past. Or, if you have a common ancestor, then it could not be so much homologous to the two different proteins. What am I missing? In other words, if today you are here, and there, and there, you had to traverse the space sometime, hadn't you?

10:56 pm  
Blogger gpuccio said...

Zachriel:

No, I have suggested two different empirical methods. One (mine) is more theoretical and requires further knowledge about proteins to be applied with precision (but is in principle completely empirical). The other one is from the paper which I have discussed in my post of 1.07 pm, and is immediately applicable. So, please check that post for the second (I am accused of being too long). I will just sum up my method for a functional protein:

a) You calculate the search space as the combinatorial space of a protein sequence of the same length (that's obviously a lower bound).

b) You make reasonable assumptions about a higher bound of the target space (number of protein sequences which can express that function at a treshold level)

c) You take b/a, and compare it to a threshold (let's say the UPB, but I would very much redefine it).

If the threshold is reached, b/a is a computation of the CSI (as complexity), probably to be expressed as negative logarithm for convenience. So, it would roughly correspond to the Fits in the second method.

b) is obviously the most difficult step, but it is open to empirical research (all the research on protein engineering is obviously helping). I am confident that in a short time we will be able to calculate b) with reasonable approximation.

The second method bypasses calculation of b) by measuring the functional space in existing proteins in different spaces, and expressing it as reduction of H. It is IMO a very convenient method, even if it makes the implicit assumption that the functional space has been completely or almost completely explored in the course of natural history.

11:10 pm  
Blogger Zachriel said...

gpuccio: Well, I am waiting to become aware of those homologous ancestors, when there will be some evidence of them. Or are we speaking of just so ancestors?

It's not known if all proteins share a common ancestor (though many important protein domains were already established in life's last common ancestor). Proteins may have multiple sources related to life's primordial origins.

Your argument is that if we don't know, then design. You think by positing a uniform probability distribution then making a big number division it changes the basic fallacy, but it doesn't.

Nonetheless, more detailed studies have found homologies where none were detected previously. That you are unaware of this just shows the sensitivity of CSI to false positives.

gpuccio: Moreover, I don't understand: if you have two completely different proteins now, it seems that you need two completely different ancestors in the past.

This is not necessarily true. Complex proteins may have diverged from very humble beginnings and left no discernable trace of their common ancestry.

gpuccio: In other words, if today you are here, and there, and there, you had to traverse the space sometime, hadn't you?

That's incorrect. That's like saying you traverse the space between cats and dogs. You don't. They descended from a common ancestor.

gpuccio: You calculate the search space as the combinatorial space of a protein sequence of the same length

Are you saying that CSI only works for proteins?

11:18 pm  
Anonymous Anonymous said...

gpuccio: Who says that?

Dembski. "The 16-digit number on your VISA card is an example of CSI. ... Even your phone number constitutes CSI."

Dembski doesn't use terms like CSI consistently, so it seems that ID opponents should be afforded the same latitude.

gpuccio: And anyway, while the ratio of a planet's circumference relative to its diameter is not necessarily designed, a numeric description of it certainly is.

Of course oleg typed or copied the sequence, but did it come from nature or was it invented whole-cloth? If we consider the sequence without hypothesizing its source, what do we use for a probability distribution?

And the null hypothesis is always that the data have come out in some random system. Necessity has to be ruled out separately in the EF, as you certainly know.

By "random system" do you mean uniformly random, or any distribution (which would include necessity, as Dembski has pointed out)?

11:29 pm  
Blogger gpuccio said...

Zachriel:

"It's not known if all proteins share a common ancestor (though many important protein domains were already established in life's last common ancestor). Proteins may have multiple sources related to life's primordial origins."

And, obviously, in OOL scenarios, important proteins had no necessity of traversing the seaqrch space? I feel that the atmosphere is becoming incresingly mystic...
Anyway, you are entitled to believe even starnger things. I never quetion the personal faith of people.

"Your argument is that if we don't know, then design. You think by positing a uniform probability distribution then making a big number division it changes the basic fallacy, but it doesn't."

There is no basic fallacy. My argument is: you don't know, and I have a good alternative. It's a very reasonable argument, and not a fallacy: except for those who are committed to another explanation in spite of any empirical evidence.

"Nonetheless, more detailed studies have found homologies where none were detected previously. That you are unaware of this just shows the sensitivity of CSI to false positives."

You can find homologies evrywhere, if you are really determined. And I must confee that I cannot find any detectable sense in your last phrase. Pehaps it was not designed...

"This is not necessarily true. Complex proteins may have diverged from very humble beginnings and left no discernable trace of their common ancestry."

It is not necessarily true, but it is true just the same. Unless we add "just so humble beginnings" to our "just so" collection. Which, by the way, is becoming increasingly large. Are you aware of how many unwarranted entities you have drawn to life just in your last post?

"Are you saying that CSI only works for proteins?"

No, I am saying that proteins (or, if you want, èprtoen genes) are the system where it is easiest to calculate it. If we go to higher levels of organization, CSI certainly increases of many orders of magnitude, but it would be increasingly difficult to claculate it. That is a very common situation in science, especially physics. That's why I have never understood why Demski has tried to compute the CSI of the flagellum. He could have stopped at one of its proteins. And anyway, the CSI of a complex machine is "at least" the sum of the CSI of its components (indeed, it is evidently much more).

11:58 pm  
Blogger gpuccio said...

R0b:

"Of course oleg typed or copied the sequence, but did it come from nature or was it invented whole-cloth? If we consider the sequence without hypothesizing its source, what do we use for a probability distribution?"

Again, I don't understand. If the sequence was pi (first figures), it was specified because there is only one pi sequence, and no other approximation. I maintain that the "number" pi (in any base, be it binary or decimal) cannot "come from nature. Numbers are a creation of human mind, especially real, irrational and transcendental numbers. And there is no choice about the probability distribution to assume. The target space is 1, the search space is 2^60, and in absence of any information about the random system which could have generated the sequence, you can only assume an uniform distribution. As to necessity, I am not aware of any natural law which can output the sequence of pi.

12:07 am  
Blogger gpuccio said...

R0b:

"By "random system" do you mean uniformly random, or any distribution (which would include necessity, as Dembski has pointed out)?"

Jsut to be clear: random systems occur in many different forms, and they can be modeled by many different probability distributions. The choice of the most appropriate ditribution to model an empirical system is a problem of methodology, not a statistical one. Again, if you have no information about the system, you can only assume the uniform distribution. That's common procedure. But if you have some kn owledge of the empirical system, you can choose another distribution (and possibly verify empirically that it fits the data well enough). For instance, in many natural systems the normal distribution would apply.

Whatever the distribution, we are always speaking of random systems. A sustem modeled by an uniforma distribution is in no way more ramdom than one modeled by the normal distribution. Necessity has nothing to do with that.

And no random distribution can really favor higher level information like CSI. As I have pointed out in another place, you should build an "ad hoc" and completely artificial distribution, assigning for instance propability 0.99 to one protein chain, and 0.1 to all the others. And then change it for the next protein...

12:19 am  
Blogger Zachriel said...

gpuccio: My argument is: you don't know, and I have a good alternative.

Your premise is flawed. We know a great deal about how evolution works from many independent lines of evidence. What you have left is we don't know everything, so you find a Gap and put design in there.

gpuccio: if you have two completely different proteins now, it seems that you need two completely different ancestors in the past.

Zachriel: This is not necessarily true. Complex proteins may have diverged from very humble beginnings and left no discernable trace of their common ancestry.

gpuccio: It is not necessarily true, but it is true just the same.

You can't salvage a causal conditional that way. Two sequences may diverge very early in their history when they were relatively simple. Your statement that if we have two completely different proteins that they *of necessity* must have completely different ancestors is not correct.

You really need to try and understand how simple evolutionary algorithms work to test your intuitions against.

Zachriel: Are you saying that CSI only works for proteins?

gpuccio: No, I am saying that proteins (or, if you want, èprtoen genes) are the system where it is easiest to calculate it.

Then you still haven't provided me the clear definition of CSI I requested. I would like to try a few examples, but let me make sure I understand your method. You take the specificity (i.e. the number of sequences that meet the description) divided by the number of possible sequences (uniform distribution). Is this correct?

1:06 am  
Blogger oleg said...

gpuccio,

Interesting. Why didn't you answer 1 in 2^60 prior to my revealing that the number was pi?

Suppose I give you another sequence, 1100111100 0110111011 1100110111 0010111111 1010010100 1111100000. Will you use the same answer (1 in 2^60) again?

1:26 am  
Blogger Zachriel said...

gpuccio: Your statement that if we have two completely different proteins that they *of necessity* must have completely different ancestors is not correct.

Another option is a simple frameshift. What are the odds that a frameshift won't just produce meaningless gibberish? Sort of like shifting all the letters of the alphabet and expecting an entire paragraph to still make sense.

Bopuifs pqujpo jt b tjnqmf gsbnftijgu. Xibu bsf uif peet uibu b gsbnftijgu xpo'u kvtu qspevdf nfbojohmftt hjccfsjti? Tpsu pg mjlf tijgujoh bmm uif mfuufst pg uif bmqibcfu boe fyqfdujoh bo foujsf qbsbhsbqi up tujmm nblf tfotf.

2:02 am  
Blogger Zachriel said...

The attribution in my previous comment is incorrect. That was my statement, not gpuccio's.

2:06 am  
Blogger gpuccio said...

Zachriel:

"Your premise is flawed. We know a great deal about how evolution works from many independent lines of evidence. What you have left is we don't know everything, so you find a Gap and put design in there. "

Well, you obviously believe that we know a great deal, and I believe differently: I believe that what you think we know is in great part wrong.

So, we have different beliefs. I can live with that. I confess I somewhat suspected that. And you think you are right. And I think I am right. That's good for me. And we are also spending time trying to fairly compare our beliefs. That's "very" good for me.

"You can't salvage a causal conditional that way. Two sequences may diverge very early in their history when they were relatively simple."

That's exactly what I am challenging. Simple proteins have no significant function. And complex proteins do not derive from simple proteins by small random functional steps.

" Your statement that if we have two completely different proteins that they *of necessity* must have completely different ancestors is not correct."

I thought it was clear what I meant: my statement is not "logically" correct, but it is "empirically" true (if my conclusion in the previous paragraph is true, which I do believe).

"You really need to try and understand how simple evolutionary algorithms work to test your intuitions against."

Why don't you briefly explain? I would very much like to shift to the role of making questions, instead of always having to give answers...

"You take the specificity (i.e. the number of sequences that meet the description) divided by the number of possible sequences (uniform distribution). Is this correct?"

Yes.

6:47 am  
Blogger gpuccio said...

Zachriel:

"Another option is a simple frameshift. What are the odds that a frameshift won't just produce meaningless gibberish?"

And so? I don't see your point. I perfectly agree with that. You can certainly obtain two very different proteins by a simple frameshift, but at least one will be completely non functional. In other words, you are trying to traverse the search space with a single step. And, as I have repeatedly said, you cannot really traverse the search space and get lucky.

6:51 am  
Blogger gpuccio said...

oleg:

"Interesting. Why didn't you answer 1 in 2^60 prior to my revealing that the number was pi?"

I don't understand your point too (am I getting old?). Let's say it again (at risk of being accused of being prolix):

a) the probability of "any" random sequence of 60 bits is 1:2^60 (if we assume an uniform distribution).

b) I cannot consider a sequence specified unless I am aware of a possible specification. In that situation, I will tentatively treat any sequence as random. So I will not attribute any CSI to it. If a possible specification is offered by someone else, I will have to reconsider, because as we have said the definition itself of CSI highly favors false negatives (for instance, because an existing specification is not recognized).

c) Once a specification is suggested (by anybody), I have to evaluate it. If it is a correct specification, then I have to calculate the target space for that specification.

d) In the case of pi, the specification is very correct (pi has definitely a special meaning in our understanding of reality), and the target space is 1, because no other sequence has that meaning. That's why I answered that in the example the functional complexity was 1:2^60, which still anyway does not qualify as CSI according to the general threshold. Still, as I have said, I am confident enough that the sequence is designed (a threshold of 60 bits is more than enough in our context: we are not discussing OOL here, but a sequence in a blog; still, a false positive at this level could be vaguely admissable).

"Suppose I give you another sequence, 1100111100 0110111011 1100110111 0010111111 1010010100 1111100000. Will you use the same answer (1 in 2^60) again?"

I would follow the same reasoning as above, and stop at point b). I must confess that almost any binary sequence appears random to me, and I have not the proficiency, or the goodwill, to further analyze it.

7:08 am  
Anonymous Anonymous said...

gpuccio: If the sequence was pi (first figures), it was specified because there is only one pi sequence, and no other approximation.

No matter what the sequence is, there's only one of them.

I maintain that the "number" pi (in any base, be it binary or decimal) cannot "come from nature. Numbers are a creation of human mind, especially real, irrational and transcendental numbers.

Then no analysis is necessary. Oleg's sequence is a human creation simply by virtue of being numerical.

And there is no choice about the probability distribution to assume. The target space is 1, the search space is 2^60, and in absence of any information about the random system which could have generated the sequence, you can only assume an uniform distribution.

Funny how the target is drawn post hoc. Why isn't the target "Fibonacci sequence"?

With regards to the absence of any information, Dembski says that if we do "not know enough to determine all the relevant chance hypotheses" then "a design inference could not even get going".

As to necessity, I am not aware of any natural law which can output the sequence of pi.

Obviously not. You said above that nature can't produce any numbers. Likewise the pattern "elliptical orbit" can't come from nature since nature doesn't create words.

I'm not trying to twist your words here. I'm trying to point out the nebulous nature of Dembski's concepts. You say things like "let's remember that human design can and does easily generate new CSI" as if it's an established fact, but if you were to try to do a study on this, I think you'd find that "CSI" too ill-defined to be studied scientifically.

A sustem modeled by an uniforma distribution is in no way more ramdom than one modeled by the normal distribution. Necessity has nothing to do with that.

Is a delta function not a valid distribution? How about a distribution that's arbitrarily close to a delta function?

And no random distribution can really favor higher level information like CSI.

In one sense, that's tautologically true, since any event described by a probability distribution falls under the chance+necessity category. But I know of no empirical or logical support that such CSI exists, or is even a coherent concept.

7:33 am  
Anonymous Anonymous said...

Mark:

I again draw your attention tot he comments I made at 1:26 pm and 8:44 pm yesterday on the challenge you posted, also that the first comment in the thread answers to Oleg's challenge.

That raises a question: Was your challenge a serious one to wards dialogue or just a springboard for tangential points-scoring rhetorical debates? [In which I have less than no interest.]

G'day.

GEM of TKI

__________

PS: There is also an issue raised above on how can a relatively short PIN number be CSI.

"Easy," if we (a) give room for loose use of terms and (b) look at the context.

The PIN is part of an integrated functional system, so the PIN is not by itself CSI [a 4-digit decimal number stores much less than 500 - 1,000 bits of information] but is part of a CSI-bearing algorithmic system that by far exceeds the limit.

Similarly, there are individual proteins that are within the UPB, but when they are aggregated into the functionality of a cell and its processes, we see the FSCI emerging.

8:44 am  
Anonymous Anonymous said...

Rob:

I could not but see your just above:

>> I know of no empirical or logical support that such CSI exists, or is even a coherent concept.>>

So, a PPS.

First, as you should know, the CSI concept is prior to Dembski et al, and traces to OOL investigator the late Leslie Orgel in 1973:

>> Living organisms are distinguished by their specified complexity. Crystals fail to qualify as living because they lack complexity; mixtures of random polymers fail to qualify because they lack specificity.6 [Source: L.E. Orgel, 1973. The Origins of Life. New York: John Wiley, p. 189.] >>

Similarly, as TBO summarise in TMLO in 1984, ch 8, the state of OOL studies by mid 1980's was:

>> Yockey7 and Wickens5 develop the same distinction, that "order" is a statistical concept referring to regularity such as could might characterize a series of digits in a number, or the ions of an inorganic crystal. On the other hand, "organization" refers to physical systems and the specific set of spatio-temporal and functional relationships among their parts. Yockey and Wickens note that informational macromolecules have a low degree of order but a high degree of specified complexity. In short, the redundant order of crystals cannot give rise to specified complexity of the kind or magnitude found in biological organization; attempts to relate the two have little future. >>

[The latter is of course the source for my descriptive abbreviation, functionally specified, complex information, FSCI Cf. my discussion here. And pardon that it goes over your beloved 200 word limit, as it has to substantiate facts and address a range of observed confusions and attempted rebuttals by many who will look for the slightest flaw to toss it out, baby, bathwater and all. As to style it openly presents itself as work in progress notes for those wanting to dig in deeper, not a polished work intended to tickle our ears with pleasing turns of phrase [I make a classical literary allusion here].]

Such a concept is obviously quite coherent and empirically relevant.

Moreover, the very fact that you recognise the comments on this thread as the posts of intelligent agents not artifacts of random noise is because you intuitively understand and accept the CSI and EF concepts: it is formally possible for random noise to generate apparent messages, but the odds of such are plainly so remote that you routinely infer to the best, empirically well supported explanation: intelligent design.

In short, you have fallen into self-referential incoherence, I am afraid.

last but not least, GP has well said how that inference to design extends to the empirically anchored inference to intelligent design form its observed reliable signs, even where humans -- who, on pain of question-begging, do not exhaust the list of possible designers --were plainly not involved.

And, ID is the science that studies empirical signs of intelligence.

G'day, again.

GEM of TKI

9:04 am  
Anonymous Anonymous said...

PS: Pardon: the 200 word limit is set by the blog owner, though waived for this thread

9:27 am  
Anonymous Anonymous said...

Oleg

Following up from the previous thread.

There you said you continued here. i guess that's:

>> KF, as I said on the other thread, this string is not limited to 60 bits. I can supply 500, 1000, or 10000 bits of the sequence if necessary for the analysis.

All we want to see is how CSI is actually determined in a very simple case. If IDers cannot determine CSI even in simple cases, all their talk about CSI in biology is meaningless. >>

But in both that thread and in the one I have linked at UD, I have provided cases of measurements on CSI for simple cases. Further to that I have given biologically relevant cases, based on DNA in particular.

In the case of a string based on an algorithm, I have noted that the bits are functional, and that as they string length exceeds 1,000 bits, we can infer confidently to intelligent production. [Bits strings in texts are not naturally occurring after all, and the algorithms that generate them as output are functionally specific and complex.]

I have also pointed out many, many times, that ANY contextually responsive ASCII text string in say English of reasonable length [with the 1,000 bit upper limit, 143 characters] is an illustration of CSI. And, going back to the 1970's - 80's, men like Orgel, Yockey and Wickens provided examples, if not metrics.

In short, the objection was long since answered before it was made.

G'day again

GEM of TKI

12:32 pm  
Blogger oleg said...

Kairosfocus,

You did not answer my challenge. You just said the sequence wast too short to be CSI. If you'd like to demonstrate your prowess, I can give you a binary sequence that is 1000 bits long. If not, I have nothing more to say to you.

12:47 pm  
Blogger Mark Frank said...

gpuccio

This comment is rather long but it is all one story so I think it will easy enough to read.

You say


a) humans design things.

b) CSI is connected to human design, and nothing else (except biological information).

c) biological information exhibits CSI, like human artifacts.

d) therefore, a causal process similar to human design is a reasonable hypothesis for biological information.
You may accept that or not, but it's not circular.



Let me illustrate how your argument looks to me with a parallel argument. It is slightly different with respect to (b) - but I think you have (b) slightly wrong. In other respects it seems exactly parallel.

I present - the case that the origin of life is magnetic.

Some of the things we see in the world are the result (at least in part) of magnetism. Others are not. For example, the common alignment of ferrous elements in some minerals has a magnetic cause.

How do we know if an event or object has a magnetic cause?

One way is through observation and experiment. But another is by eliminating alternative causes. For example, in the case of the ferrous elements the alternative is that the ferrous minerals were aligned by chance. I can calculate the probability of this happening (subject to a particular specification for ‘aligned’ and assuming all alignments are equally likely). The negative logarithm to base 2 of this probability is the Non Magnetic Index (NMI). In other cases of course it is much harder to estimate the NMI but that is purely a practical difficulty.

I observe that some events with observed magnetic causes have high NMI. And that all events with high NMI either have magnetic causes or the cause is not yet determined and if an alternative cause is found then the NMI turns out to be low after all. So high NMI is connected to magnetic causes.

Life has a very high NMI. The only non-magnetic cause that has been proposed is Darwinian Evolution and it is well-known that this is stupendously improbable.

So clearly something similar to magnetism is responsible for life.

Some may object that there is no plausible mechanism for magnetism to create life. But that is not our concern. At the moment we are only interested in detecting magnetism in general – not in how it operates. I personally believe in cosmic magnetism which is capable of anything – but that is a personal belief and not science.

12:50 pm  
Blogger Zachriel said...

gpuccio: Why don't you briefly explain? I would very much like to shift to the role of making questions, instead of always having to give answers...

The thread was founded as an invitation to anyone who could step us through Dembski's calculation of CSI.

Zachriel: You take the specificity (i.e. the number of sequences that meet the description) divided by the number of possible sequences (uniform distribution). Is this correct?"

gpuccio: Yes.

Well, we already have a problem. For instance, the winner of a Roulette betting on double-zero. It might look like this (L=Lose, W=Win): LLLLLLLLLLLLLWLLLLLLLLL...

If we assume a uniform distribution and a few thousand bets, we're going to get a false positive. So I'm sure you need to consider other distributions besides a strictly uniform distribution. How do we determine P(T|H)?

I'm not trying to be difficult, but that is exactly the type of question I have been asking on various forums for months.

12:51 pm  
Blogger Zachriel said...

Zachriel: Another option is a simple frameshift. What are the odds that a frameshift won't just produce meaningless gibberish?

gpuccio: And so? I don't see your point. I perfectly agree with that. You can certainly obtain two very different proteins by a simple frameshift, but at least one will be completely non functional.

Except that frameshifts do not always yield non-functional sequences. For instance, nylonase was formed by a frameshift. And once formed, it continued to diverge as it optimized to its new role.

12:59 pm  
Blogger Zachriel said...

gpuccio: Simple proteins have no significant function.

Sequences shorter than 20 or 30 residues don't always have a well- defined three-dimensional structure, but they can still have biological function. Known enzymatic proteins can be as short as 62 residues. And randomly generated sequences of length 80 can be functional.

Keefe & Szostak, Functional proteins from a random-sequence library, Nature 2001.

1:09 pm  
Anonymous Anonymous said...

Mark:

In the Wales thread, I have had to remark on the attempted "outing" of my name, by someone who by his own confession has had an earlier experience with the reason why I ask for my personal name not to be used in relatively high traffic corners of the 'net. I have commented on it there and requested action on your part.

I must also note that the attempted "outing" of ID supporters is a known harassment tactic,as that may provoke institutional retaliation.

I am sorry to have to point out such to you, but that sort of hitting below the belt is a known part of what is going on.

GEM of TKI

PS: I have also remarked on several matters of substance there and in the yet earlier thread, for the record.

PPS: Oleg you are free to say anything you want. I have objectively addressed the issue you have raised. To summarise: [a] a 60-bit string with no context is simply too short and not known to be functional, so the EF will rule that it is chance, accepting the likelihood of a false negative as that is immaterial to its purpose [bits are inherently contingent]. A 1000 bit string that is the product of an algorithm [or even the 60-bit one] will be FSCI in action, though most of the relevant information -- codes, algorithms, physical execution machinery etc -- will be hidden.

1:28 pm  
Blogger Zachriel said...

kairosfocus, can you use Dembski's or gpuccio's calculation of CSI on a few examples we put forth so that we can see the arithmetic? How do we determine P(T|H)?

(For convenience, we may use ellipses to indicate the full length of the sequence.)

1:48 pm  
Blogger oleg said...

gpuccio,

This time the number was the golden ratio φ=1.6180339887... I suppose now you will say that it's quite specific: there is only one golden ratio. (Well, technically, two: −0.6180339887... is another golden ratio).

That brings us to a realization that it doesn't make sense to compute CSI of a single number. Since I mentioned that I had used pi for my password, it is more sensible to ask how specific are Oleg's passwords? You have now seen two of them. Formulate a hypothesis and try to estimate how many passwords I may have up my sleeve. That would be closer to calculating CSI of proteins (we'll get back to that).

2:00 pm  
Blogger gpuccio said...

R0b:

"No matter what the sequence is, there's only one of them."

Let's say then: there is only one sequence corresponding to the numeric value of pi. Have we to play with words? We have been talking of specification. We have said that here the specification is the correspondence with pi. What I obviously meant is that, if the specification is a protein function, there are many seqences which are included. In this case, there is only one. Is it more clear now?

"Then no analysis is necessary. Oleg's sequence is a human creation simply by virtue of being numerical."

OK, let's be patient. We have a sequence in some physical system where the events can be interpreted as a sequence of 0s and 1s. We have a sequence of events. We have two possible theories: the sequence of events is random, and can be modeled according to some probability distribution (let's assume uniform). Or, the sequence has been intelligently guided by an agent to express the figures of pi. Therefore, if the random theory is chosen, the sequence has not really a numeric nature, even if we can interpret it as a binary number, indeed a random one. But if the sequence was used to give a mathematical meaning, then it is specified, and designed. We are just comparing two different interpretations of a physical system.

"Funny how the target is drawn post hoc. Why isn't the target "Fibonacci sequence"?"

If you had presented a sequence corresponding to the Fibonacci sequence, that would have been the target. For the analysis, we draw the target after we have identified the specification. How could we draw it before? And why is it funny?

"Obviously not. You said above that nature can't produce any numbers. Likewise the pattern "elliptical orbit" can't come from nature since nature doesn't create words."

Again, you don't understand what I mean. Perhaps I am trying to be too short! What I mean is obviously that I am not aware of any natural law which can output events which can be read as the sequence of pi (see previous point). In the same sense, you find elliptical orbits in nature, but you don't find sequences of physical events which can be read as a mathematical description of an ellipse. You seem to miss the difference between a physical event and a physical description of the event in a symbolic code.

More in the next post.

3:12 pm  
Blogger gpuccio said...

R0b:

"You say things like "let's remember that human design can and does easily generate new CSI" as if it's an established fact, but if you were to try to do a study on this, I think you'd find that "CSI" too ill-defined to be studied scientifically."

Let's put it in an even simpler way. I define here CSI in a minimal way: any sequence of digital information which has the following properties:

a) it is at least 500 bits long.

b) it is specified. For the moment, we define specification as one of three things: 1) it corresponds to some known mathematical object; 2) it corresponds to some meaningful and correct discourse in english language; 3) it describes the sequence of a known functional protein.

c) we are aware of no known way to generate it in a spontaneous (non designed) physical system by laws of necessity.

Can you accept this definition? I am not giving here any other meaning to the word CSI. And I am not saying that this CSI is always recognizable. I am just saying that if a digital sequence has those three characteristics, we say that it has the property of CSi (in this post).

The pi sequence, if we imagine to extend it to 500 or more bits, has the property of CSI as here defined. Can you agree on that?

Well, I say that it is an established fact that human design can easily generate that sequence, and no random system can. Is that simple?

Well, the "minimal" definition of CSI used here is not so different from my general definition, only I have reduced it to the essential to avoid ambiguities. Above all, I have restricted, and defined explicitly, the specification, so that we all can agree that it is an objective definition. And I am implying no extra meaning: only the empirical fact that there exist some digital sequences which we can define as having CSI, and that for those sequences it is very easy to verify that a human designer can generate them through a design process, while a random system cannot.

"Is a delta function not a valid distribution? How about a distribution that's arbitrarily close to a delta function?"

A delta function is a mathematical object which "can" be interpreted as a probability distribution. But the only physical system which it can model is strict necessity. I have myself given an example of a distribution close to a delta function in my post: I paste here:

"you should build an "ad hoc" and completely artificial distribution, assigning for instance propability 0.99 to one protein chain, and 0.1 to all the others. And then change it for the next protein..."

But tell me, what kind of physical system are you thinking you can model by such a distribution? Let's remember that we can build any distribution we want, if we respect the mathematical constraints, but that does not mean that they are of any value in the physical world.

"But I know of no empirical or logical support that such CSI exists, or is even a coherent concept."

See above. But you are entitled to your own opinions, and it is not my purpose to convince you.

3:40 pm  
Blogger gpuccio said...

Zachriel:

"The thread was founded as an invitation to anyone who could step us through Dembski's calculation of CSI."

Is that a reason not to explain? You disappoint me.

"Well, we already have a problem. For instance, the winner of a Roulette betting on double-zero. It might look like this (L=Lose, W=Win): LLLLLLLLLLLLLWLLLLLLLLL...

If we assume a uniform distribution and a few thousand bets, we're going to get a false positive. "

I am not sure I understand what you mean. Could you explain better, please? In what sense we're going to get a false positive?

3:44 pm  
Blogger gpuccio said...

Zachriel:

"Except that frameshifts do not always yield non-functional sequences. For instance, nylonase was formed by a frameshift. And once formed, it continued to diverge as it optimized to its new role."

Ah, nylonase! I suspected you would have drawn that. Aren't you selling as proved a very vague theory?

Anyway, it has been a long time since I last evaluated the nylonase issue. If I find time, I will try to refresh my memory, and give you some more specific comment.

3:47 pm  
Blogger gpuccio said...

Zachriel:

"Sequences shorter than 20 or 30 residues don't always have a well- defined three-dimensional structure, but they can still have biological function. Known enzymatic proteins can be as short as 62 residues. And randomly generated sequences of length 80 can be functional."

In know that paper, which is interesting; I have already commented on it at UD recently. Just consider that 62 or 80 aminoacids is not so little (even if it is for a medium protein). They correspond to spaces of 10^80 and 10^104. And the proteins in the paper were not randomly generated, but selected after cycles of an engineering process of random variation and intelligent selection. I cite from the abstract:

"Starting from a library of 6 x 1012 proteins each containing 80 contiguous random amino acids, we selected functional proteins by enriching for those that bind to ATP. This selection yielded four new ATP-binding proteins that appear to be unrelated to each other or to anything found in the current databases of biological proteins."

3:55 pm  
Blogger Zachriel said...

gpuccio: c) we are aware of no known way to generate it in a spontaneous (non designed) physical system by laws of necessity. Can you accept this definition?

Your definition just changed.

gpuccio: Is that a reason not to explain?

I didn't mean to suggest that I would not explain evolutionary algorithms, only pointing out why I am asking questions.

Zachriel: Well, we already have a problem. For instance, the winner of a Roulette betting on double-zero. It might look like this (L=Lose, W=Win): LLLLLLLLLLLLLWLLLLLLLLL... If we assume a uniform distribution and a few thousand bets, we're going to get a false positive.

gpuccio: I am not sure I understand what you mean. Could you explain better, please? In what sense we're going to get a false positive?

Because the pattern is a far-cry from a uniform distribution. Let's simplify it further. Let's assume a pattern of all L's. This is highly specified. Assuming a uniform probability distribution (appropriate for someone flipping a coin), we would suspect design (a two sided-coin perhaps). Yet we might get this same distribution from observing any manner of regular occurences.

But you have since changed the definition. You do see that you have done this, don't you?

6:59 pm  
Blogger Zachriel said...

gpuccio: They correspond to spaces of 10^80 and 10^104. And the proteins in the paper were not randomly generated, but selected after cycles of an engineering process of random variation and intelligent selection.

Read it again. They "enriched" random sequences that had function. In essence, they separated out those with function. About one in a trillion sequences with 80 residues will have biological function.

7:02 pm  
Blogger Zachriel said...

Selection only works when there is some function. There is some variation during amplification, but they form distinct families.

7:32 pm  
Blogger gpuccio said...

Zachriel:

"Your definition just changed."

From my post to oleg (which had a specific contest):

"Let's put it in an even simpler way. I define here CSI in a minimal way: any sequence of digital information which has the following properties:

a) it is at least 500 bits long.

b) it is specified. For the moment, we define specification as one of three things: 1) it corresponds to some known mathematical object; 2) it corresponds to some meaningful and correct discourse in english language; 3) it describes the sequence of a known functional protein.

c) we are aware of no known way to generate it in a spontaneous (non designed) physical system by laws of necessity."

First of all, this is a simplified definition for that post, just to make a very clear discourse in response to oleg's objections about the nature of CSI. But then I say:

"Well, the "minimal" definition of CSI used here is not so different from my general definition, only I have reduced it to the essential to avoid ambiguities. Above all, I have restricted, and defined explicitly, the specification, so that we all can agree that it is an objective definition."

In other words, here I have given partial, but completely explicit, definitions of some kinds of specification.

But in essence I have changed nothing. CSI is defined by three properties: complexity (higher than 1:10^150); specification; and no necessary causal mechanism. If you were so kind to specify what I would have changed, we can discuss.

Regarding the roulette example, I understand that your argument is as follows: if in a system where one event (W) has a probability of 1/38 and the other event (L) has a probability of 37/38, if against all evidence we assume a uniform distribution of the two events, we will get a wrong calculation of the probability of the sequences we observe. If we consider a sequence of all Ls as specified, we could consider that as a "positive" for CSI, and that would be a false positive.

Please, confirm if my understanding is correct, and I will give you my comments.

7:43 pm  
Blogger gpuccio said...

Zachriel:

about the paper, unfortunately I could not read the whole text. I paste here a comment by Rachel Brem on Genome Biology:

"As a protein design study, this paper hits the mark expertly. As a study of sequence evolution rates, there are some gray areas. For example, proteins from Keefe and Szostak's first eight rounds of selection appear to use their tethered mRNA in binding ATP; thus, strictly speaking, 1011 80-mers alone may not yield one functional protein. The next step for Keefe and Szostak will be to estimate whether nature has exhaustively searched functional protein space. To do this, they will need to characterize their isolates structurally and compare them with known ATP-binding proteins."

I am not saying that the paper is not interesting. indeed, we need more studies like that to quantify aspects of the target space which are at present poorly understood. I look forward to new data form protein engineering, and I am ready to discuss them.

7:55 pm  
Blogger Zachriel said...

gpuccio: But in essence I have changed nothing. CSI is defined by three properties: complexity (higher than 1:10^150); specification; and no necessary causal mechanism. If you were so kind to specify what I would have changed, we can discuss.

This is what you said previously. (And I had to work to get this.)

Zachriel: Then you still haven't provided me the clear definition of CSI I requested. I would like to try a few examples, but let me make sure I understand your method. You take the specificity (i.e. the number of sequences that meet the description) divided by the number of possible sequences (uniform distribution). Is this correct?

gpuccio: Yes.

So, the definition now appears to be the Explanatory Filter. Please note that Dembski's definition of CSI doesn't appear to include an explicit term for causal mechanism. Is it any wonder there is confusion on this?

8:11 pm  
Blogger gpuccio said...

Zachriel:

I don't see all those problems that you seem to see. I am trying no tricks. The elimination of a causal mechanism of necessity seems to be an obvious step. If there is a known necessity mechanism which can explain the data, that is obviously the explanation. Where is the problem?

And please, could we stop citing Dembski at any moment? Are we talking science, or doing an exegesis? It seems that darwinists are much more Dembski dependent than we IDists are.

9:35 pm  
Blogger gpuccio said...

Zachriel:

And I am still expecting that explanation about evolutionary algorithms.

9:38 pm  
Blogger gpuccio said...

Mark,

I would like to answer your post, but I have not had the time to consider it seriously. I will do that as soon as possible.

9:41 pm  
Blogger Zachriel said...

Zachriel: So, the definition now appears to be the Explanatory Filter.

gpuccio: I don't see all those problems that you seem to see.

Apparently, information being complex and specified is not enough for it to be complex specified information.

You're assuming a probability distribution simply because you don't know of a "necessity mechanism". This is a classic argument from ignorance. The more ignorant you are, the more design you will detect.

gpuccio: And I am still expecting that explanation about evolutionary algorithms.

I'm not sure what you're asking. I might say that evolutionary algorithms can generate specified complexity, but it doesn't seem to be a well-defined metric. Was there something specific?

11:21 pm  
Anonymous Anonymous said...

gpuccio, you've been an extremely good sport here. I'm sorry I've been a pain in the neck. My point in quoting Dembski is to show the inconsistency and disarray that characterizes the ostensible science of ID. I think it's a natural consequence of sloppy and nebulous foundational concepts. I second David Wolpert's appraisal of Dembski's framework as "fatally informal and imprecise".

It's true that this is a nontechnical forum, so we don't expect formality or rigor. The problem is that ID rigor isn't to be found anywhere. You can search the math and science literature in vain for a technical treatment of specified complexity. In the few cases in which Dembski makes concrete claims that are mathematically or empirically falsifiable, those claims either turn out to be wrong or they offer no verifiable support for his ID claims, which are themselves poorly defined.

Equivocal language exacerbates the problem. Dembski says that he started using the term "specified complexity" in place of "specified improbability" because Orgel and Davies used the former term. There's no reason to think that Orgel and Davies were referring to improbability, and Dembski has never tried to make a case for it as far as I know. The unconventional usage of "complexity" has caused quite a bit of confusion in the ID, even among the ID leaders. Meyer thinks that specified complexity entails irregularity and algorithmic incompressibility, and O'Leary, Thaxton, and Durston think that it entails aperiodicity.

If the ID holds that CSI is a well-defined concept, that position is easily tested. Simply have several different people work through the same CSI problem independently and see how close the results are. The participants can even be mathematicians, or top-tier ID leaders. Dembski could have done this in the intelligent design class he taught, but I don't see any evidence that he ever had his students actually work through a problem. Is there really a right answer to CSI problems?

BTW, gpuccio, have you concluded 60 bits of specified complexity for oleg's sequence? Do you think that Dembski would agree with that? How much would you be willing to bet on that?

Okay, I'm shooting off my mouth. Thanks, gpuccio, for your grace in the face of my hostility. You're a true gentleman. (Or lady?)

11:26 pm  
Blogger Zachriel said...

gpuccio: And I am still expecting that explanation about evolutionary algorithms.

At Uncommon Descent, dgosse posted this example to show the implausibility of evolution.

dgosse: To spell the word “evolution,” obtaining the nine letters in order, each having a 1/26 probability, you have a probability of 1 in 5,429,503,678,976. This, as you will realize, comes from multiplying 26 by itself, using the figure 9 times. If every five seconds day and night a person drew out one letter, he could expect to succeed in spelling the word “evolution” about once in 800,000 years!

This is a typical example of an ID calculation. We have a sequence that we randomly mutate. But let's consider an evolutionary algorithm.

We have a population of letter sequences. We mutate and recombine them in generations. If the offspring don't perfectly spell a word, the offspring are eliminated without issue. We might limit the population and delete the shortest words as less fit. Start with the single-letter word "O". Would you expect a similar result before you saw the first nine-letter word?

o, to, top, stop, stomp, stomps, ?

Or maybe a nine-letter word will never evolve being so isolated in the vast oceans of meaningless sequences.

12:10 am  
Blogger Zachriel said...

R0b: gpuccio, you've been an extremely good sport here.

Yes he has. I'm a slow learner and ask a lot of questions.

12:13 am  
Blogger Zachriel said...

There are about 10000 nine-letter words in my dictionary, so only one in 500 million nine-letter sequences form perfectly spelled words. And remember, we have to approach new words by single steps, with each step also being a perfectly spelled word.

12:24 am  
Anonymous Anonymous said...

Zachriel

I refer you to the above where the specific cases raised at the head of the thread have been discussed.

As to the onward issue of how probabilities are estimated, I refer you to the discussion in app 1 my online note, point 9, for Bradley's discussion on Cytochrome C in light of his earlier one of the generic protein in TMLO circa 1984. You will see how he adjusts the default, highly useful Laplacian indifference flat distribution as empirically grounded, while using Brillouin Information [negentropy per Boltzmann's s = k ln w] as a metric of ICSI. I excerpt:

>> Cytochrome c (protein) -- chain of 110 amino acids of 20 types

If each amino acid has pi = .05 [i.e. flat distribution], then average information “i” per amino acid is given by log2 (20) = 4.32

The total Shannon information is given by I = N * i = 110 * 4.32 = 475, with total number of unique sequences “W0” [this is the omega of Boltzmann's s = k ln w] that are possible is W0 = 2^I = 2^475 = 10^143

Amino acids in cytochrome c are not equiprobable (pi ≠ 0.05) as assumed above. [i.e. he here brings to bear the empirical data on this protein; nb the wider context that he is the one whose work led Kenyon to abandon his prior Biochemical Predestination thesis, through showing that AA residues do not show substantial and systematic departures from quasi-uniform bonding patterns of AA's in proteins.]

If one takes the actual probabilities of occurrence of the amino acids in cytochrome c, one may calculate the average information per residue (or link in our 110 link polymer chain) to be 4.139 using i = - ∑ pi log2 pi [TKI NB: which is related of course to the Boltzmann expression for S]

Total Shannon information is given by I = N * i = 4.139 x 110 = 455.

The total number of unique sequences “W0” that are possible for the set of amino acids in cytochrome c is given by W0 = 2^455 = 1.85 x 10^137

. . . . Some amino acid residues (sites along chain) allow several different amino acids to be used interchangeably in cytochrome-c without loss of function, reducing i from 4.19 to 2.82 and I (i x 110) from 475 to 310 (Yockey)

M = 2^310 = 2.1 x 10^93 = W1

Wo / W1 = 1.85 x 10^137 / 2.1 x 10^93 = 8.8 x 10^44 [Source]>>

For a much simpler instance of the utility of functionally specific bits and the 500 - 1,000 bit threshold, do you know of cases where comments in this thread or elsewhere, in contextually responsive English, of length greater than 143 ASCII characters [~ 18 words of average length], are credibly the result of lucky noise, not intelligence? [And Genetic Algorithms with their oracles are an example of active information and design.]

Or, even more simply: why do you take it as default that apparent posts in this tread are real posts, not mere lucky noise and/or the product of deterministic blind, lawlike natural forces?

In short, I believe there is good reason to see that there have long since been adequate and reasonably easily accessible examples of FSCI/CSI at work for serious discussion.

G'day

GEM of TKI

PS: Per the biological situation, the challenge on 9-letter words should also require that the words be contextually functional in existing sentences at each stage, or in novel ones that also come together by chance at the right time to function from co-optation of words in a pre-existing paragraph say. [For, enzymes etc must function in existing contexts, or per he answer to IC, must co-opt existing functionality.]

6:50 am  
Anonymous Anonymous said...

Mark

I again note that I have posted a suggested initial response on your Dembski X-metric challenge at UD, here.

GEM of TKI

6:53 am  
Blogger Mark Frank said...

KF

I again note that I have posted a suggested initial response on your Dembski X-metric challenge at UD, here.

Thanks for going to the effort and I sorry not to respond. You have a style of debating which I find extremely hard work. For that reason I am avoiding following up on any of your comments until I have more time. Others, of course, may wish to do so.

I would be delighted if you continue to contribute. I just want to explain why I am not responding.

Mark

7:31 am  
Blogger Mark Frank said...

gpuccio

Mark,

I would like to answer your post, but I have not had the time to consider it seriously. I will do that as soon as possible.


Thanks. I decided that this was sufficiently far off topic to form a new post which is a more complete account.

10:21 am  
Blogger Zachriel said...

kairosfocus: G'day

All your verbiage is just saying you assume the protein was thrown together randomly.

Can you calculate the Dembski's CSI of the binary series 111111.... ?

kairosfocus: the challenge on 9-letter words should also require that the words be contextually functional in existing sentences at each stage

That wasn't the question. Comparing a random search and an evolutionary algorithm (as described above), would you expect a similar number of trails before you saw the first nine-letter word? Or would a nine-letter word never evolve being so isolated in the vast oceans of meaningless sequences.

12:49 pm  
Blogger Zachriel said...

kairosfocus: If one takes the actual probabilities of occurrence of the amino acids in cytochrome c ...

This does seem to answer how to determine P(T|H). Take the average frequency over the given sequence. Is that correct? That may help with the example 111111..., but I'll wait to see your answer.

1:21 pm  
Blogger Zachriel said...

kairosfocus: [And Genetic Algorithms with their oracles are an example of active information and design.]

I provided an example of an evolutionary algorithm. The only oracle is whether a sequence spells a word perfectly. If not, the mutant is stillborn. There is no nearer or farther.

1:29 pm  
Anonymous Anonymous said...

Mark:

You asked for a calculation, I have provided it and drew it to your attention to it several days ago.

In essence I have given p(T|H) based on the relevant combinations, and fS(T) based on the existence of three other similar outcomes. Log manipulations gave X as - 361, much lower algebraically than the threshold for CSI, 1. No great surprise as a 13 hand at cards is far more probable in that context than 1 in 10^150.

Now, you may or may not agree with the interpretation and may or may not like my "style" of taking a step by step methodical approach to complex matters [as I think appropriate with brief non-standalone summaries such as just above], but there it is:

(i) WmAD's metric can be applied to your test case, and

(ii) there is at least one commenter at UD willing to make an attempt at such a calculation.

(Onlookers: Both of these go to remarks MF made at UD, which is where I posted the suggested solution.)

G'day

GEM of TKI

5:18 am  
Anonymous Anonymous said...

__________________________

PS: Zachriel, I note:

1 --> You will note that Bradley [NOT GEM -- I am explicitly citing a presentation . . .] -- BTW, a polynmer expert -- was discussing the issue of origination of a representative short protein by chance forces in a prebiotic, OOL environment; as Ch 8 of TMLO, 1984, explicitly addressed.

2 --> Bradley was giving an update to the generic protein discussion in that chapter. So, it is entirely in order to consider the production of such a contingent outcome on the null hyp of chance, adn to adjust form Laplacian indifference by using observed frequencies of AA's in the Cytochrome C family, as is evidently also used by Yockey et al; whom he onward refers to. This is also connected to the basic information capacity metric used by Shannon et al.

3 --> In so citing, I give a THIRD information metric for CSI/FSCI in sufficient details for responsible onlookers to see how that is arrived at. It is not mere verbiage -- the dismissive rhetorical tactic of choice for avoiding dealing with issues in this thread.

4 --> 11111 . . . is an example of orderly sequence complexity, which has been distinguished from FSCI/CSI ever since the 1970's, by Orgel et al.

5 --> It is specified but not complex: repeat "1." It is on an irrelevancy as step one of the EF, low/no contingency would have eliminated it from further consideration. [And, if the sequence is the product of a program or algorithm, the focus would shift to that . . .]

6 --> As to the context of the 9-letter word, I simply pointed out as a relevant but perhaps overlooked input, that it needs to be in a complete sentence to be properly parallel to the bio-functionality issue that is at the heart of the discussion.

7 --> Genetic Algorithms, in ALL known cases, are intelligently designed, and carry out constrained [pseudo-] random searches.

8 --> A 1/0 no- "warmer while being non-functional" oracle is a better one than the sort of one that is commonly met with. But functionality for the real case involves more than getting a meaningful sequence, it has to be in a useful context. that is, 1/0 on an overly simplistic function is in effect just as irrelevant tot he real issue as is the more common "broadcasting oracle" problem.

GEM of TKI

5:20 am  
Blogger gpuccio said...

R0b: "gpuccio, you've been an extremely good sport here."

Zachriel: "Yes he has. I'm a slow learner and ask a lot of questions."

Well, thank you for those comments. They really mean much for me, and not only in a personal sense.

I am very happy of the discussion we had here. It was fun, I learnt much, and it was certainly cognitively rewarding. But it was also very demanding in terms of my personal resources.

I was very busy yesterday, and I took a little rest. That helped me realize that, however, I cannot keep the commitment to go on indefinitely with that rate of discussion here, and still do all the other things which I should do.

There are many things I would like to add on the issues we have touched, and I am sure the same is true for you. I was tempted to make a summary of where we stand, but then I realized that probably it is better to leave the discussion "open". "Open" is a word I like very much, and I must credit you with true openmindedness and availability to listen and discuss. Thank you for that.

So, I think I will just give Mark a long due answer on the new thread about magnetism, and then take some rest (I hope).

Anyway, I am sure we'll meet again... :-)

7:46 am  
Blogger Zachriel said...

kairosfocus: Now, you may or may not agree with the interpretation and may or may not like my "style" of taking a step by step methodical approach to complex matter

Are you saying that you can't calculate Dembski's CSI? Or that you reject its validity?

kairosfocus: It is specified but not complex: repeat "1."

I have tried for months to get people to provide the actual calculations of the CSI for a few simple examples. How do you determine P(T|H)? What is the CSI? Please show your arithmetic. I'd like to try several examples.

kairosfocus: It is on an irrelevancy as step one of the EF, low/no contingency would have eliminated it from further consideration.

Are you saying we can't calculate the CSI of such examples? What if we aren't cognizant of the order and make the calculation anyway?

12:42 pm  
Anonymous Anonymous said...

Zachriel

It seems this is my final remark to you; as we have now clearly passed the point of reasonable and productive dialogue.

For, it is plain that WITH AN EXAMPLE SITTING IN FRONT OF YOU just a link away, you still wish to act as though the issue of calculating in accord with the Dembski metric has not been at least initially addressed, per the challenge as laid out in this blog. That includes how p(T|H) is estimated, why, and how fS(T) may also be addressed, plus the resulting estimate for the case in view as set by Mark. In short, though cumbersome, the Dembski metric can be used.

On your further instance, 1111 . . . you seem to wish to act as though it has not been recognised, ever since 1973, that order and specified complexity are utterly different and reasonably and objectively distinguishable, so that the CSI metrics address not order but the marking out of directed from undirected contingency.

I cite Orgel, 1973:

>> Living organisms are distinguished by their specified complexity. Crystals fail to qualify as living because they lack complexity; mixtures of random polymers fail to qualify because they lack specificity.6 [Source: L.E. Orgel, 1973. The Origins of Life. New York: John Wiley, p. 189.] >>

Furthermore, I have noted on the challenges of the Debski metric, and have put forth as is linked at the very first post in this thread, a simpler but effective one, functionally specific bits. I have also cited from the earlier, thermodynamically based approach, Brillouin information.

As a simpler rule of thumb, if an entity has function-specifying information, and requires art least 500 - 1,000 functionally specific bits of storage capacity to carry its required information for that function, then we can be fairly confident that the entity is designed. As an instance, a string of about 18 contextually responsive words in ASCII English text will be to moral certainty designed not lucky noise. For 128^143 ~ 2^1,000. [If you doubt its effectiveness, just try to come up with an exception of known origin, where such a string or the equivalent has occurred by lucky noise. The whole Internet stands as instantiation of how well supported this rule of thumb is.)

In short, enough has been said, long since.

Good day sir.

GEM of TKI

3:35 pm  
Blogger Zachriel said...

kairosfocus: On your further instance, 1111 . . . you seem to wish to act as though it has not been recognised, ever since 1973, that order and specified complexity are utterly different and reasonably and objectively distinguishable, so that the CSI metrics address not order but the marking out of directed from undirected contingency.

We can't always distinguish order.

You don't actually say we can't calculate the CSI for an arbitrary string. But we know you haven't for the example provided.

Of note, Dembski considers such strings in his paper.

kairosfocus: It seems this is my final remark to you; as we have now clearly passed the point of reasonable and productive dialogue.

Those were the easy calculations. We haven't even got to the interesting questions yet.

4:49 pm  
Blogger Zachriel said...

kairosfocus: Genetic Algorithms, in ALL known cases, are intelligently designed,

Yes, that's what we typically mean by an algorithm. I note you didn't attempt to answer the questions.

kairosfocus: 1/0 on an overly simplistic function

Yes, it's simple by design. That way we can test our intuitions and knowledge of how such processes work. Later, we can test more complex examples.

5:36 pm  
Anonymous Anonymous said...

On a positive note, my hat is off to both gpuccio and Kairosfocus for coming up with numbers. I salute the both of you.

If I understand correctly, gpuccio is saying that oleg's sequence has 60 bits of CSI, and Kairos is saying that the hand of spades has about 361 bits of CSI. Do I understand correctly?

If so, here's my question: If Dembski could be persuaded to solve these same problems, do you think he'd come up with these same numbers?

6:04 pm  
Blogger gpuccio said...

R0b:

Just a quick answer to your question. Knowing what Dembski would say implies mind reading and prophecy, and I am not yet proficient enough.

My idea is that CSI is a reality which can be defined and measured in slightly different ways, like many other scientific realities. As we are still in a pioneering stage of CSI theories and research, it's perfectly natural that there is not universal consensus on the various approaches. Darwinists, who have a special interest in trying to demonstrate that CSI is a false concept, try obviously to use those differences as a weapon to affirm that the concept itself is inconsistent. But that's not the case.

I have tried to show that the important thing is not that all of us have one universal definition of CSI, but rather that in each discussion one be committed to give explicitly and clearly his definition and to stay consistent with it in all his following arguments.

11:52 pm  
Blogger Mark Frank said...

KF

On consideration I think it is unreasonable not to respond to your calcuation of CSI for 13 spades.

My question would be what is your justification for fS(T)? UD seems to be down at the moment but as I remember you justified including the other suits with a phrase on the lines of "similar" which is hardly rigorous.

Here are some other criteria you might have used.

1) A functional definition:

Hands which will certainly give you a grand slam if your bid is successful. There are of course many such hands - consider how many hands will certainly with with 7 NT.

2) Hands which can be described in a minimal number of concepts. Well that could include: "has no aces" which is a very broad set of hands.

3) Hands which can be completely specified by the shortest possible computer programme. Consider that all the cards can be ordered by suit within rank: Ace of Spades at the top and 2 of clubs at the bottom. Then a very simple programme is "select the top 13 cards" or "the bottom 13" or start at any card and choose the next 13. There is no reason to suppose this is longer than "choose 13 cards of the same suit".

And so on ....

My objective is only to show that whatever method is chosen it is ambiguous and requires a subjective element.

Mark

8:44 am  
Blogger Zachriel said...

gpuccio: As we are still in a pioneering stage of CSI theories and research, it's perfectly natural that there is not universal consensus on the various approaches.

Keep in mind that many in the ID Community consider CSI to be a conclusive argument for a very strong claim. Hence, rigor is essential.

If it's just speculation, then you might want to remind Dembski and others that they don't yet have a rigorous argument.

I note that no one is willing to step me through Dembski's calculation.

gpuccio: My idea is that CSI is a reality which can be defined and measured in slightly different ways, like many other scientific realities.

That's fine. But I'm still unclear on your calculation. Can we calculate the CSI of an arbitrary string as Dembski suggests?

1:17 pm  
Blogger gpuccio said...

Zachriel:

"Can we calculate the CSI of an arbitrary string as Dembski suggests?"

As for me, I can calculate it only if I am aware of an explicit functional specification, and if I can in some way calculate, or at least reasonably approximate, the target space for that function. But maybe Dembski can do better. That's just my approach.

1:51 pm  
Blogger Zachriel said...

gpuccio: As for me, I can calculate it only if I am aware of an explicit functional specification, and if I can in some way calculate, or at least reasonably approximate, the target space for that function.

Wouldn't a "full house" have a function within the game of poker?

Let's assume you were the first to discover nylonase, a sequence of about 400 amino acids. It has no evident homologs and it's the only one ever discovered. What is the CSI of nylonase?

7:32 pm  
Anonymous Anonymous said...

I can see the argument gpuccio and KF are making, and I know why on the face of it, it's convincing. Visualising evolution as 'islands of function', and evolution as simply a random walk across it, does give the impression that leaps are quite impossible.

I reckon that people may be mixing up 'fitness space' with 'function space', however. (I say this because I was up until now. :) )

Fitness space, for evolution, is normally the landscape with hills and dips where height is fitness. Fitness is really just the success of offspring.

But CSI doesn't seem to be defined on fitness space, but rather on 'function space', where height means 'ability to perform a function'.

And what this means is that for any organism, there are multiple 'function spaces'. Ability to spin a flagellum. Ability to tolerate poison. and so forth. Fitness space is kind of the superset of all these.

So it seems to me that you cannot call evolution a random but directed walk across 'function space'. Because it's actually walking several at the same time, and can even walk up in one, but down in another (ie. can discard or not rely on some functions if others are more beneficial.)

2:56 am  
Anonymous Anonymous said...

There are many problems with the island metaphor:

1. As Venus just mentioned, the fitness landscape is a composite of many "function spaces", not just one.

2. The environment is neither homogeneous nor static. Conditions change across time and space, and so does the fitness landscape.

3. Species co-evolve with others. As each species evolves, it changes the fitness landscapes of the others in the ecosystem.

4. Fitness is not a function of one's position on the surface of a two-dimensional ocean, as the island metaphor implies. The image of neo-Darwinian evolution as an island-hopping journey is therefore extremely misleading.

NDE actually moves through an n-dimensional space, where n is a very large number. There are many more reachable nearby points in an n-dimensional space than there are on the two-dimensional surface of a genotypic "ocean", and NDE can exploit the full dimensionality of the space it operates in.

Furthermore, the availability of so many dimensions means that it is much harder to get stuck on a local maximum, because true local maxima are so rare. A point has to be a maximum with respect to all possible directions of movement in order to qualify as a local maximum.

Well-chosen metaphors can be an aid to understanding, but the island metaphor misleads more than it clarifies.

7:55 am  
Blogger gpuccio said...

Venus and Keith:

Some of the things you say are quite reasonable, but nonetheless not very relevant. I don't want to start a long discussion here, but I will just offer some reflexions (anyway, this is a very important subject to discuss):

1) It is true that the space of a function is only a subset of the space for "any possible function", but I really don't believe that the concept of "any possible function" is so relevant as darwinists think. We should anyway reason on realistic models. Please, see my first two posts on this thread (especially the second one) for further inputs. Moreover, just consider my proposal for a "true" algorithmic simulation of RV and NS (I think it should be somewhere in this post), where we should rely on the ability of some digital replicator to gain further complexity by "any possible new function" which implies a true, spontaneous reproductive advantage in some digital environment, where the environment must know "nothing" about the replicator or about any anticipated function. That would be a true model of the "any possible function" landscape. Do you think it would work? Why not try that?

2) Environmental changes are certainly important where NS is supposed to act, but we should remember that most biological functions are fundamental in most possible environments, and are conditioned more by the general structure of the replicator (body plan, or just biochemical solutions) than by some change in environment. The true origin of NS is the replicator itself, although the replicator certainly has to adapt to its environment. For instance, most biochemical mechanisms have to satisfy biochemical laws (think of all the cell machinery for DNA replication, transcription, translation, post-translational modifications, and so on). The transition form unicellular beings to multicellular beings would require many complex engineering adaptations, whatever the environment and its constraints. What I mean is that many constraints are given by the function itself, and not by the environment. And the already existing structure of the replicator imposes severe constraints to what new functions can be constructively implemented, whatever the environment.

3) Regarding multidimensional landscapes, I believe that does not change much. The problem remains always the same: what random changes can really bring a reproductive advantage to a replicator, at each step, in each stage, with some definite environment? It is not enough to "hope" that in a multidimensional space anything is possible. You have to show that with a real model, which can be applied to what we really know. Behe has done a very good empirical work about that in TEOE.

In other words, well-chosen mathematical abstractions can be an aid to understanding, but the multidimensional landscape metaphor misleads more than it clarifies.

3:43 pm  

Post a Comment

<< Home